Probabilistic PCA
Implementation of probabilistic PCA (PPCA). PPCA allows to perform PCA on incomplete data and may be used for missing value estimation. This script was implemented after the Matlab version provided by Jakob Verbeek ( see http://lear.inrialpes.fr/~verbeek/) and the draft “EM Algorithms for PCA and Sensible PCA” written by Sam Roweis.
ppca(Matrix, nPcs = 2, seed = NA, threshold = 1e-05, maxIterations = 1000, ...)
Matrix |
|
nPcs |
|
seed |
|
threshold |
Convergence threshold. |
maxIterations |
the maximum number of allowed iterations |
... |
Reserved for future use. Currently no further parameters are used. |
Probabilistic PCA combines an EM approach for PCA with a probabilistic model. The EM approach is based on the assumption that the latent variables as well as the noise are normal distributed.
In standard PCA data which is far from the training set but close to the principal subspace may have the same reconstruction error. PPCA defines a likelihood function such that the likelihood for data far from the training set is much lower, even if they are close to the principal subspace. This allows to improve the estimation accuracy.
A method called kEstimate
is provided to estimate the
optimal number of components via cross validation. In general few
components are sufficient for reasonable estimation accuracy. See
also the package documentation for further discussion on what kind
of data PCA-based missing value estimation is advisable.
Complexity:
Runtime is linear in the number of data,
number of data dimensions and number of principal components.
Convergence: The threshold indicating convergence was changed from 1e-3 in 1.2.x to 1e-5 in the current version leading to more stable results. For reproducability you can set the seed (parameter seed) of the random number generator. If used for missing value estimation, results may be checked by simply running the algorithm several times with changing seed, if the estimated values show little variance the algorithm converged well.
Standard PCA result object used by all PCA-based methods
of this package. Contains scores, loadings, data mean and
more. See pcaRes
for details.
Requires MASS
. It is not recommended to use this
function directely but rather to use the pca() wrapper function.
Wolfram Stacklies
## Load a sample metabolite dataset with 5\% missing values (metaboliteData) data(metaboliteData) ## Perform probabilistic PCA using the 3 largest components result <- pca(t(metaboliteData), method="ppca", nPcs=3, seed=123) ## Get the estimated complete observations cObs <- completeObs(result) ## Plot the scores plotPcs(result, type = "scores")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.