Identification of structural SNPs
The function snpzip
identifies the set of alleles which contribute most
significantly to phenotypic structure.
This procedure uses Discriminant Analysis of Principal Components (DAPC) to quantify the contribution of individual alleles to between-population structure. Then, defining contribution to DAPC as the measure of distance between alleles, hierarchical clustering is used to identify two groups of alleles: structural SNPs and non-structural SNPs.
snpzip(snps, y, plot = TRUE, xval.plot = FALSE, loading.plot = FALSE, method = c("complete", "single", "average", "centroid", "mcquitty", "median", "ward"), ...)
snps |
a snps |
y |
either a |
plot |
a |
xval.plot |
a |
loading.plot |
a |
method |
the clustering method to be used. This should be
(an unambiguous abbreviation of) one of |
... |
further arguments. |
snpzip
provides an objective procedure to delineate between structural
and non-structural SNPs identified by Discriminant Analysis of Principal Components
(DAPC, Jombart et al. 2010).
snpzip
precedes the multivariate analysis with a cross-validation step
to ensure that the subsequent DAPC is performed optimally.
The contributions of alleles to the DAPC are then submitted to hclust
,
where they define a distance matrix upon which hierarchical clustering is carried out.
To complete the procedure, snpzip
uses cutree
to automatically
subdivide the set of SNPs fed into the analysis into two groups:
those which contribute significantly to the phenotypic structure of interest,
and those which do not.
A list
with four items if y
is a factor, or two items if
y
is a dapc object:
The first cites the number of principal components (PCs) of PCA retained in the DAPC.
The second item is an embedded list which
first indicates the number of structural and non-structural SNPs identified by
snpzip
, second provides a list of the structuring alleles, third
gives the names of the selected alleles, and fourth details the
contributions of these structuring alleles to the DAPC.
The optional third item provides measures of discrimination success both overall and by group.
The optional fourth item contains the dapc object generated if y
was a factor.
If plot=TRUE
, a scatter plot will provide a visualization of the DAPC results.
If xval.plot=TRUE
, the results of the cross-validation step will be displayed
as an array
of the format generated by xvalDapc, and a scatter plot of
the results of cross-validation will be provided.
If loading.plot=TRUE
, a loading plot will be generated to show the
contributions of alleles to the DAPC, and the SNP selection threshold will be indicated.
If the number of Discriminant Axes (n.da
) in the DAPC is greater than 1,
loading.plot=TRUE
will generate one loading plot for each discriminant axis.
Caitlin Collins caitlin.collins12@imperial.ac.uk
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics11:94. doi:10.1186/1471-2156-11-94
## Not run: simpop <- glSim(100, 10000, n.snp.struc = 10, grp.size = c(0.3,0.7), LD = FALSE, alpha = 0.4, k = 4) snps <- as.matrix(simpop) phen <- simpop@pop outcome <- snpzip(snps, phen, method = "centroid") outcome ## End(Not run) ## Not run: simpop <- glSim(100, 10000, n.snp.struc = 10, grp.size = c(0.3,0.7), LD = FALSE, alpha = 0.4, k = 4) snps <- as.matrix(simpop) phen <- simpop@pop dapc1 <- dapc(snps, phen, n.da = 1, n.pca = 30) features <- snpzip(dapc1, loading.plot = TRUE, method = "average") features ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.