Compute and optimize a-score for Discriminant Analysis of Principal Components (DAPC)
These functions are under development. Please email the author before using them for published results.
a.score(x, n.sim=10, ...) optim.a.score(x, n.pca=1:ncol(x$tab), smart=TRUE, n=10, plot=TRUE, n.sim=10, n.da=length(levels(x$grp)), ...)
x |
a |
n.pca |
a vector of |
smart |
a |
n |
an |
plot |
a |
n.sim |
an |
n.da |
an |
... |
further arguments passed to other methods; currently unused.. |
The Discriminant Analysis of Principal Components seeks a reduced space inside which observations are best discriminated into pre-defined groups. One way to assess the quality of the discrimination is looking at re-assignment of individuals to their prior group, successful re-assignment being a sign of strong discrimination.
However, when the original space is very large, ad hoc solutions can be found, which discriminate very well the sampled individuals but would perform poorly on new samples. In such a case, DAPC re-assignment would be high even for randomly chosen clusters. The a-score measures this bias. It is computed as (Pt-Pr), where Pt is the reassignment probability using the true cluster, and Pr is the reassignment probability for randomly permuted clusters. A a-score close to one is a sign that the DAPC solution is both strongly discriminating and stable, while low values (toward 0 or lower) indicate either weak discrimination or instability of the results.
The a-score can serve as a criterion for choosing the optimal number of
PCs in the PCA step of DAPC, i.e. the number of PC maximizing the
a-score. Two procedures are implemented in optim.a.score
. The
smart procedure selects evenly distributed number of PCs in a
pre-defined range, compute the a-score for each, and then interpolate
the results using splines, predicting an approximate optimal number of
PCs. The other procedure (when smart
is FALSE) performs the
computations for all number of PCs request by the user. The 'optimal'
number is then the one giving the highest mean a-score (computed over
the groups).
=== a.score ===a.score
returns a list with the following components:
tab |
a matrix of a-scores with groups in columns and simulations in row. |
pop.score |
a vector giving the mean a-score for each population. |
mean |
the overall mean a-score. |
=== optim.a.score ===optima.score
returns a list with the following components:
pop.score |
a list giving the mean a-score of the populations for each number of retained PC (each element of the list corresponds to a number of retained PCs). |
mean |
a vector giving the overall mean a-score for each number of retained PCs. |
pred |
(only when |
best |
the optimal number of PCs to be retained. |
Thibaut Jombart t.jombart@imperial.ac.uk
Jombart T, Devillard S and Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genetics11:94. doi:10.1186/1471-2156-11-94
- find.clusters
: to identify clusters without prior.
- dapc
: the Discriminant Analysis of Principal
Components (DAPC)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.