Function to undertake a Robust Closed Data Multivariate EDA
The function carries out a robust Principal Components Analysis (PCA) and estimates the Mahalanobis distances for a closed compositional, geochemical, dataset and places the results in an object to be saved and post-processed for display and further manipulation. Robust procedures are used, ‘MCD’, ‘MVE’ or user supplied weights, for classical procedures see gx.mva.closed
, or for non-compositional data and robust procedures see gx.robmva
. For results display see gx.rqpca.screeplot
, gx.rqpca.loadplot
, gx.rqpca.plot
, gx.rqpca.print
, gx.md.plot
and gx.md.print
. For Kaiser varimax rotation see gx.rotate
.
gx.robmva.closed(xx, proc = "mcd", wts = NULL, main = deparse(substitute(xx)))
xx |
a |
proc |
by default |
wts |
by default |
main |
by default the name of the object |
The data are initially isometrically log-ratio transformed and a robust covariance matrix and vector of means estimated, by either the Minimum Covariance Determinant (MCD) or Minimum Volume Elloipsoid (MVE) procedures, or on the basis of a vector of user supplied weights. The Mahalanobis distances are computed on the basis of the ilr transformed data. The ilr transformed data and robust estimates, including the inverse of the covariance matrix, are then back-transformed to the centred log-ratio basis and a Principal Components Analysis (PCA) undertaken (see Filzmoser, et al., 2009), permitting the results to be interpreted in the original variable space.
The variances of the robust Principal Component scores are displayed, in a non-robust PCA these decrease with increasing component rank. However, in a robust PCA this may not be the case, and lower-order scores with high variances are often worthy of further inspection.
If main
is undefined the name of the matrix object passed to the function is used to identify the object. This is the recommended procedure as it helps to track the progression of a data analysis. Alternate plot titles are best defined when the saved object is passed to gx.rqpca.plot
, gx.rqpca.screeplot
or gx.md.plot
for display. If no plot title is required set main = " "
, or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text"
.
The following are returned as an object to be saved for subsequent display, etc.:
main |
by default (recommended) the input data matrix name. |
input |
the data matrix name, |
proc |
the robust procedure used, the value of |
n |
the total number of individuals (observations, cases or samples) in the input data matrix. |
nc |
the number of individuals remaining in the ‘core’ data subset following the robust estimation, i.e. the sum of those individuals with |
p |
the number of variables on which the multivariate operations were based. |
ifilr |
flag for |
matnames |
the row numbers or identifiers and column headings of the input matrix. |
wts |
the vector of weights for the |
mean |
the length |
cov |
the |
cov.inv |
the |
sd |
the length |
snd |
the |
r |
the |
eigenvalues |
the vector of |
econtrib |
the vector of |
eigenvectors |
the |
rload |
the |
rcr |
the |
rqscore |
the |
vcontrib |
a vector of |
pvcontrib |
the vector of |
cpvcontrib |
the vector of |
md |
the vector of |
ppm |
the vector of /coden robust ilr-based predicted probabilities of population membership, see Garrett (1990). |
epm |
the vector of |
nr |
the number of PCs that have been rotated. At this stage of a data analysis |
Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df
.
Any rows in the data matrix with NA
s are removed prior to computations. In the instance of a compositional data opening transformation NA
s have to be removed prior to undertaking the transformation, see na.omit
, where.na
and remove.na
. When that procedure is followed the opening transformations may be executed on calling the function, see Examples below.
Warnings are generated when the number of individuals (observations, cases or samples) falls below 5*p
, and additional warnings when the number of individuals falls below 3*p
. At these low ratios of individuals to variables the shape of the p
-space hyperellipsoid is difficult to reliably define, and therefore the results may lack stability. These limits 5*p
and 3*p
are generous, the latter especially so; many statisticians would argue that the number of individuals should not fall below 9*p
, see Garrett (1993).
Robert G. Garrett
Filzmoser, P., Hron, K., Reimann, C. and Garrett, R., 2009. Robust factor analysis for compositional data. Computers & Geosciences, 35(9):1854-1861.
Garrett, R.G., 1990. A robust multivariate allocation procedure with applications to geochemical data. In Proc. Colloquium on Statistical Applications in the Earth Sciences (Eds F.P. Agterberg & G.F. Bonham-Carter). Geological Survey of Canada Paper 89-9, pp. 309-318.
Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.
Grunsky, E.C., 2001. A program for computing RQ-mode principal components analysis for S-Plus and R. Computers & Geosciences, 27(2):229-235.
Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.
## Make test data available data(sind.mat2open) ## Generate gx.robmva.closed object sind.save <- gx.robmva.closed(sind.mat2open) ## Display Mahalanobis distances gx.md.plot(sind.save) ## Display default PCA results gx.rqpca.screeplot(sind.save) gx.rqpca.loadplot(sind.save) ## Display appropriately annotated results gx.md.plot(sind.save, main = "Howarth & Sinding-Larsen\nStream Sediments, Opened Data", cex.main=0.8) gx.rqpca.screeplot(sind.save, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") gx.rqpca.plot(sind.save, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") gx.rqpca.plot(sind.save, rowids = TRUE, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") sind.save$pvcontrib gx.rqpca.plot(sind.save, v1 = 3, v2 =4, rowids = TRUE, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") ## Display Kaiser Varimax rotated (nrot = 4) results sind.save.rot4 <- gx.rotate(sind.save, 4) gx.rqpca.plot(sind.save.rot4, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") gx.rqpca.plot(sind.save.rot4, rowids = TRUE, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") gx.rqpca.plot(sind.save.rot4, v1 = 3, v2 =4, rowids = TRUE, main = "Howarth & Sinding-Larsen Stream Sediments\nOpened Data") ## Clean-up rm(sind.save) rm(sind.save.rot4)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.