Function to undertake an Exploratory Multivariate Data Analysis
The function carries out a Principal Components Analysis (PCA) and estimates the Mahalanobis distances for a dataset and places them in an object to be saved and post-processed for display and further manipulation. Classical procedures are used, for robust procedures see gx.robmva
. For results display see gx.rqpca.screeplot
, gx.rqpca.loadplot
, gx.rqpca.plot
, gx.rqpca.print
, gx.md.plot
and gx.md.print
. For Kaiser varimax rotation see gx.rotate
. For closed. compositional, data use gx.mva.closed
.
gx.mva(xx, main = deparse(substitute(xx)))
xx |
a |
main |
by default the name of the object |
If main
is undefined the name of the matrix object passed to the function is used to identify the object. This is the recommended procedure as it helps to track the progression of a data analysis. Alternate plot titles are best defined when the saved object is passed to gx.rqpca.plot
, gx.rqpca.screeplot
or gx.md.plot
for display. If no plot title is required set main = " "
, or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text"
.
The following are returned as an object to be saved for subsequent display, etc.:
main |
by default (recommended) the input data matrix name. |
input |
the data matrix name, |
proc |
the procedure used, by default |
n |
the total number of individuals (observations, cases or samples) in the input data matrix. |
nc |
the number of individuals remaining in the ‘core’ data subset after trimming. At this stage of a data analysis |
p |
the number of variables on which the multivariate operations were based. |
ifilr |
flag for |
matnames |
the row numbers or identifiers and column headings of the input matrix. |
wts |
the vector of weights for the |
mean |
the vector the weighted means for the |
cov |
the |
sd |
the vector of weighted standard deviations for the |
snd |
the |
r |
the |
eigenvalues |
the vector of |
econtrib |
the vector of |
eigenvectors |
the |
rload |
the |
rcr |
the |
rqscore |
the |
vcontrib |
a vector of |
pvcontrib |
the vector of |
cpvcontrib |
the vector of |
md |
the vector of |
ppm |
the vector of |
epm |
the vector of |
nr |
the number of PCs that have been rotated. At this stage of a data analysis |
Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df
.
Any rows in the data matrix with NA
s are removed prior to computions. In the instance of a compositional data opening transformation NA
s have to be removed prior to undertaking the transformation, see na.omit
, where.na
and remove.na
. When that procedure is followed the opening transformations may be executed on calling the function, see Examples below.
Note that, executing a clr
transformation leads to a singular matrix that can not be inverted for the estimation of Mahalanobis distances. In that case the values of md
, ppm
and epm
are all set to NULL
.
Note that, executing a ilr
transformation permits the estimation of Mahalanobis distances and associated probabilities through the use of p-1
synthetic variables. However, in that instance the loadings of the p-1
synthetic variables will be plotted by gx.rqpca.plot
rather than the loadings for the elements.
Therefore, use function gx.mva.closed
for compositional, geochemical, data.
Warnings are generated when the number of individuals (observations, cases or samples) falls below 5p
, and additional warnings when the number of individuals falls below 3p
. At these low ratios of individuals to variables the shape of the p
-space hyperellipsoid is difficult to reliably define, and therefor the results may lack stability. These limits 5p
and 3p
are generous, the latter especially so; many statisticians would argue that the number of individuals should not fall below 9p
, see Garrett (1993).
Robert G. Garrett
Garrett, R.G., 1990. A robust multivariate allocation procedure with applications to geochemical data. In Proc. Colloquium on Statistical Applications in the Earth Sciences (Eds F.P. Agterberg & G.F. Bonham-Carter). Geological Survey of Canada Paper 89-9, pp. 309-318.
Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.
Grunsky, E.C., 2001. A program for computing RQ-mode principal components analysis for S-Plus and R. Computers & Geosciences, 27(2):229-235.
Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.
## Make test data available data(sind.mat2open) ## Generate gx.mva object, for demonstration purposes only ## These are compositional data - gx.mva.closed should be used sind.save <- gx.mva(sind.mat2open) gx.rqpca.screeplot(sind.save) gx.rqpca.loadplot(sind.save) gx.rqpca.plot(sind.save) ## Display saved object with alternate main titles gx.rqpca.loadplot(sind.save, main = "Howarth & Sinding-Larsen\nStream Sediments, clr Transformed Data", cex.main = 0.8) gx.rqpca.plot(sind.save, main = "Howarth & Sinding-Larsen\nStream Sediments, clr Transformed Data", cex.main = 0.8) ## Display Mahalanobis distances in a Chi-square plot gx.md.plot(sind.save) ## Display saved object with alternate main titles gx.md.plot(sind.save, main = "Howarth & Sinding-Larsen\nStream Sediments, ilr Transformed Data", cex.main = 0.8) ## Clean-up rm(sind.save)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.