SVDimpute algorithm
This implements the SVDimpute algorithm as proposed by Troyanskaya
et al, 2001. The idea behind the algorithm is to estimate the
missing values as a linear combination of the k
most
significant eigengenes.
svdImpute(Matrix, nPcs = 2, threshold = 0.01, maxSteps = 100, verbose = interactive(), ...)
Matrix |
|
nPcs |
|
threshold |
The iteration stops if the change in the matrix falls below this threshold. |
maxSteps |
Maximum number of iteration steps. |
verbose |
Print some output if TRUE. |
... |
Reserved for parameters used in future version of the algorithm |
Missing values are denoted as NA
. It is not recommended
to use this function directely but rather to use the pca() wrapper
function.
As SVD can only be performed on complete matrices, all missing values are initially replaced by 0 (what is in fact the mean on centred data). The algorithm works iteratively until the change in the estimated solution falls below a certain threshold. Each step the eigengenes of the current estimate are calculated and used to determine a new estimate. Eigengenes denote the loadings if pca is performed considering variable (for Microarray data genes) as observations.
An optimal linear combination is found by regressing the
incomplete variable against the k
most significant
eigengenes. If the value at position j
is missing, the
j^th value of the eigengenes is not used when
determining the regression coefficients.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details.
Each iteration, standard PCA (prcomp
) needs to be
done for each incomplete variable to get the eigengenes. This is
usually fast for small data sets, but complexity may rise if the
data sets become very large.
Wolfram Stacklies
Troyanskaya O. and Cantor M. and Sherlock G. and Brown P. and Hastie T. and Tibshirani R. and Botstein D. and Altman RB. - Missing value estimation methods for DNA microarrays. Bioinformatics. 2001 Jun;17(6):520-5.
## Load a sample metabolite dataset with 5\% missing values data(metaboliteData) ## Perform svdImpute using the 3 largest components result <- pca(metaboliteData, method="svdImpute", nPcs=3, center = TRUE) ## Get the estimated complete observations cObs <- completeObs(result) ## Now plot the scores plotPcs(result, type = "scores")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.