Function for Multivariate Graphical Adaptive Interactive Trimming with Compositional Data
Function to undertake the GAIT (Graphical Adaptive Interactive Trimming) procedure for multivariate distributions through Chi-square plots of Mahalanobis distances (MDs) as described in Garrett (1988), but for closed compositional, geochemical, data. To carry out GAIT the function is called repeatedly with the weights from the previous iteration being used as a starting point. Either a percentage based MVT or a MCD robust start may be used as the first iteration.
gx.md.gait.closed(xx, wts = NULL, trim = -1, mvtstart = FALSE, mcdstart = FALSE, main = deparse(substitute(xx)), ifadd = c(0.98, 0.95, 0.9), cexf = 0.6, cex = 0.8, ...)
xx |
the |
wts |
the vector of weights for the |
trim |
the desired trim: trim < 0 - no trim, the default; trim >0 & <1 - the fraction, 0 to 1 proportion, of individuals to be trimmed; trim >= 1 - the number of individuals with the highest MDs from the previous iteration to trim. |
mvtstart |
set |
mcdstart |
set |
main |
an alternative plot title to the default input data matrix name, see Details below. |
ifadd |
if probability based fences are to be displayed on the Chi-square plots enter the probabilities here, see Details below. For no fences set |
cexf |
the scale expansion factor for the Ch-square fence annotation, by default |
cex |
the scale expansion factor for the symbols and text annotation within the ‘frame’ of the Chi-square plot, by default |
... |
further arguments to be passed to methods concerning the generated plots. For example, if some colour other than black is required for the plotting characters, specify |
The variables of the input data matrix must all be expressed in the same units. An isometric log-ratio (ilr) is undertaken and the transformed data used for the GAIT process. At the completion of the process the final ilr estimates, including the inverse of the covariance matrix, are transformed to the centred log-ratio (clr) basis. The vector of means and the inverse of the covariance matrix on a clr basis are required by function gx.mvalloc.closed
, that is undertaken on a clr basis.
If main
is undefined the name of the matrix object passed to the function is used as the plot title. This is the recommended procedure as it helps to track the progression of the GAIT. Alternate plot titles can be defined if the final saved object is passed to gx.md.plot
. If no plot title is required set main = " "
, or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text"
.
By default three fences are placed on the Chi-square plots at probabilities of membership of the current ‘core’ data subset, or total data if appropriate, with ifadd = c(0.98, 0.95, 0.9)
. Alternate probabilities may be defined as best for the display. If no fences are required set ifadd = NULL
.
The Mahalanobis distance, Chi-square, plot x-axis label is set appropriately to indicated the type of robust start or trim using the value of proc
.
The following are returned as an object to be saved for the next iteration or final use:
main |
by default (recommended) the input data matrix name. |
input |
the data matrix name, |
matnames |
the row numbers and column headings of the input matrix. |
proc |
the procedure followed for this iteration, used for subsequent Chi-sqaure plot x-axis labelling. |
wts |
the vector of weights for the |
n |
the total number of individuals (observations, cases or samples) in the input data matrix. |
ptrim |
the percentage, as a fraction, of samples called to be trimmed in this iteration, otherwise |
mean |
the |
cov |
the |
cov.inv |
the |
sd |
the |
md |
the vector of Mahalanobis distances for all the |
ppm |
the vector of predicted probabilities of membership for all the |
Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data matrix, must be removed prior to executing this function, see ltdl.fix.df
.
Any rows in the data matrix with NA
s are removed in the ilr transformation.
Warnings are generated when the number of individuals (observations, cases or samples) falls below 5p
, and additional warnings when the number of individuals falls below 3p
. At these low ratios of individuals to variables the shape of thep
-space hyperellipsoid is difficult to reliably define, and therefore the results may lack stability. These limits 5p
and 3p
are generous, the latter especially so; many statisticians would argue that the number of individuals should not fall below 9p
, see Garrett (1993).
Robert G. Garrett
Garrett, R.G., 1988. IDEAS - An interactive computer graphics tool to assist the exploration geochemist. In Current Research Part F, Geological Survey of Canada Paper 88-1F, pp. 1-13.
Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.
Garrett, R.G., 1989. The Chi-square plot - a tool for multivariate outlier recognition. In Proc. 12th International Geochemical Exploration Symposium, Geochemical Exploration 1987 (Ed. S. Jenness). Journal of Geochemical Exploration, 32(1/3):319-341.
## Make test data available data(sind.mat2open) ## To multivariate trim as in IDEAS, see JGE (1989) 32(1-3):319-341, ## but recognizing that the data are of a closed compositional form ## and using a mcd start, execute: gx.md.gait.closed(sind.mat2open,ifadd = 0.95) sind.gait.1 <- gx.md.gait.closed(sind.mat2open, mcdstart = TRUE, ifadd = NULL) sind.gait.2 <- gx.md.gait.closed(sind.mat2open, wts = sind.gait.1$wts, mvtstart = TRUE, trim = 3, ifadd = 0.9) sind.gait.3 <- gx.md.gait.closed(sind.mat2open, wts = sind.gait.2$wts, trim = 1, ifadd = 0.9) ## Display saved object with alternate main titles and list outliers gx.md.plot(sind.gait.3, cex.main = 0.8, ifadd = 0.9, main = "Howarth & Sinding-Larsen\nStream Sediments") gx.md.print(sind.gait.3, pcut = 0.2) ## Clean-up rm(sind.gait.1) rm(sind.gait.2) rm(sind.gait.3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.