Computes Yanai's GCD in the context of the variable-subset selection problem
Computes Yanai's Generalized Coefficient of Determination for the similarity of the subspaces spanned by a subset of variables and a subset of the full data set's Principal Components.
gcd.coef(mat, indices, pcindices = NULL)
mat |
the full data set's covariance (or correlation) matrix. |
indices |
a numerical vector, matrix or 3-d array of integers giving the indices of the variables in the subset. If a matrix is specified, each row is taken to represent a different k-variable subset. If a 3-d array is given, it is assumed that the third dimension corresponds to different cardinalities. |
pcindices |
a numerical vector of indices of Principal Components. By default, the first k PCs are chosen, where k is the cardinality of the subset of variables whose criterion value is being computed. If a vector of PCs is specified by the user, those PCs will be used for all cardinalities that were requested. |
Computes Yanai's Generalized Coefficient of Determination for the
similarity of the subspaces spanned by a subset of
variables (specified by indices
) and a subset of the
full-data set's Principal Components (specified by pcindices
).
Input data is expected in the form of a (co)variance or
correlation matrix. If a non-square matrix is given, it is assumed to
be a data matrix, and its correlation matrix is used as input. The
number of variables (k) and of PCs (q) does not have to be the same.
Yanai's GCD is defined as:
GCD = tr(PvPc)/sqrt(k q)
where Pv and Pc are the matrices of orthogonal projections on the subspaces spanned by the k-variable subset and by the q-Principal Component subset, respectively.
This definition is equivalent to:
GCD = sum_i (r_i^2) / sqrt(k q)
where r_i stands for the multiple correlation between the
i
-th Principal Component and the k-variable subset, and the sum
is carried out over the q PCs (i=1,...,q) selected.
These definitions are also equivalent to the expression used in the code, which only requires the covariance (or correlation) matrix of the data under consideration.
The value of the GCD coefficient.
Cadima, J. and Jolliffe, I.T. (2001), "Variable Selection and the Interpretation of Principal Subspaces", Journal of Agricultural, Biological and Environmental Statistics, Vol. 6, 62-79.
Ramsay, J.O., ten Berge, J. and Styan, G.P.H. (1984), "Matrix Correlation", Psychometrika, 49, 403-423.
## An example with a very small data set. data(iris3) x<-iris3[,,1] gcd.coef(cor(x),c(1,3)) ## [1] 0.7666286 gcd.coef(cor(x),c(1,3),pcindices=c(1,3)) ## [1] 0.584452 gcd.coef(cor(x),c(1,3),pcindices=1) ## [1] 0.6035127 ## An example computing the GCDs of three subsets produced when the ## anneal function attempted to optimize the RV criterion (using an ## absurdly small number of iterations). data(swiss) rvresults<-anneal(cor(swiss),2,nsol=4,niter=5,criterion="Rv") gcd.coef(cor(swiss),rvresults$subsets) ## Card.2 ##Solution 1 0.4962297 ##Solution 2 0.7092591 ##Solution 3 0.4748525 ##Solution 4 0.4649259
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.