Robust Linear Discriminant Analysis by Projection Pursuit
Performs robust linear discriminant analysis by the projection-pursuit approach -
proposed by Pires and Branco (2010) - and returns the results as an object of
class LdaPP
(aka constructor).
LdaPP(x, ...) ## S3 method for class 'formula' LdaPP(formula, data, subset, na.action, ...) ## Default S3 method: LdaPP(x, grouping, prior = proportions, tol = 1.0e-4, method = c("huber", "mad", "sest", "class"), optim = FALSE, trace=FALSE, ...)
formula |
a formula of the form |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
x |
a matrix or data frame containing the explanatory variables (training set). |
grouping |
grouping variable: a factor specifying the class for each observation. |
prior |
prior probabilities, default to the class proportions for the training set. |
tol |
tolerance |
method |
method |
optim |
wheather to perform the approximation using the Nelder and Mead simplex method
(see function |
trace |
whether to print intermediate results. Default is |
... |
arguments passed to or from other methods. |
Currently the algorithm is implemented only for binary classification and in the following will be assumed that only two groups are present.
The PP algorithm searches for low-dimensional projections of higher-dimensional
data where a projection index is maximized. Similar to the original Fisher's proposal
the squared standardized distance between the observations in the two groups is maximized.
Instead of the sample univariate mean and standard deviation (T,S)
robust
alternatives are used. These are selected through the argument method
and can be one of
the pair (T,S)
are the robust M-estimates of location and scale
(T,S)
are the Median and the Median Absolute Deviation
the pair (T,S)
are the robust S-estimates of location and scale
(T,S)
are the mean and the standard deviation.
The first approximation A1 to the solution is obtained by investigating
a finite number of candidate directions, the unit vectors defined
by all pairs of points such that one belongs to the first group
and the other to the second group. The found solution is stored in the slots
raw.ldf
and raw.ldfconst
.
The second approximation A2 (optional) is performed by
a numerical optimization algorithm using A1 as initial solution.
The Nelder and Mead method implemented in the function optim
is applied.
Whether this refinement will be used is controlled by the argument optim
.
If optim=TRUE
the result of the optimization is stored into the slots
ldf
and ldfconst
. Otherwise these slots are set equal to
raw.ldf
and raw.ldfconst
.
Returns an S4 object of class LdaPP-class
Still an experimental version! Only binary classification is supported.
Valentin Todorov valentin.todorov@chello.at and Ana Pires apires@math.ist.utl.pt
Pires, A. M. and A. Branco, J. (2010) Projection-pursuit approach to robust linear discriminant analysis Journal Multivariate Analysis, Academic Press, Inc., 101, 2464–2485.
## ## Function to plot a LDA separation line ## lda.line <- function(lda, ...) { ab <- lda@ldf[1,] - lda@ldf[2,] cc <- lda@ldfconst[1] - lda@ldfconst[2] abline(a=-cc/ab[2], b=-ab[1]/ab[2],...) } data(pottery) x <- pottery[,c("MG", "CA")] grp <- pottery$origin col <- c(3,4) gcol <- ifelse(grp == "Attic", col[1], col[2]) gpch <- ifelse(grp == "Attic", 16, 1) ## ## Reproduce Fig. 2. from Pires and branco (2010) ## plot(CA~MG, data=pottery, col=gcol, pch=gpch) ppc <- LdaPP(x, grp, method="class", optim=TRUE) lda.line(ppc, col=1, lwd=2, lty=1) pph <- LdaPP(x, grp, method="huber",optim=TRUE) lda.line(pph, col=3, lty=3) pps <- LdaPP(x, grp, method="sest", optim=TRUE) lda.line(pps, col=4, lty=4) ppm <- LdaPP(x, grp, method="mad", optim=TRUE) lda.line(ppm, col=5, lty=5) rlda <- Linda(x, grp, method="mcd") lda.line(rlda, col=6, lty=1) fsa <- Linda(x, grp, method="fsa") lda.line(fsa, col=8, lty=6) ## Use the formula interface: ## LdaPP(origin~MG+CA, data=pottery) ## use the same two predictors LdaPP(origin~., data=pottery) ## use all predictor variables ## ## Predict method data(pottery) fit <- LdaPP(origin~., data = pottery) predict(fit)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.