Partial Least Squares and Sparse Partial Least Squares Discriminant Analysis
plsda
is used to fit standard PLS models for classification while
splsda
performs sparse PLS that embeds feature selection and
regularization for the same purpose.
plsda(x, ...) ## S3 method for class 'plsda' predict(object, newdata = NULL, ncomp = NULL, type = "class", ...) ## Default S3 method: plsda(x, y, ncomp = 2, probMethod = "softmax", prior = NULL, ...)
x |
a matrix or data frame of predictors |
... |
arguments to pass to |
object |
an object produced by |
newdata |
a matrix or data frame of predictors |
ncomp |
the number of components to include in the model. Predictions
can be made for models with values less than |
type |
either |
y |
a factor or indicator matrix for the discrete outcome. If a matrix, the entries must be either 0 or 1 and rows must sum to one |
probMethod |
either "softmax" or "Bayes" (see Details) |
prior |
a vector or prior probabilities for the classes (only used for
|
If a factor is supplied, the appropriate indicator matrix is created.
Two prediction methods can be used.
The softmax function transforms the model predictions to "probability-like" values (e.g. on [0, 1] and sum to 1). The class with the largest class probability is the predicted class.
Also, Bayes rule can be applied to the model predictions to form
posterior probabilities. Here, the model predictions for the training set
are used along with the training set outcomes to create conditional
distributions for each class. When new samples are predicted, the raw model
predictions are run through these conditional distributions to produce a
posterior probability for each class (along with the prior). This process is
repeated ncomp
times for every possible PLS model. The
NaiveBayes
function is used with usekernel = TRUE
for the posterior probability calculations.
For plsda
, an object of class "plsda" and "mvr". For
splsda
, an object of class splsda
.
The predict methods produce either a vector, matrix or three-dimensional
array, depending on the values of type
of ncomp
. For example,
specifying more than one value of ncomp
with type = "class"
with produce a three dimensional array but the default specification would
produce a factor vector.
## Not run: data(mdrr) set.seed(1) inTrain <- sample(seq(along = mdrrClass), 450) nzv <- nearZeroVar(mdrrDescr) filteredDescr <- mdrrDescr[, -nzv] training <- filteredDescr[inTrain,] test <- filteredDescr[-inTrain,] trainMDRR <- mdrrClass[inTrain] testMDRR <- mdrrClass[-inTrain] preProcValues <- preProcess(training) trainDescr <- predict(preProcValues, training) testDescr <- predict(preProcValues, test) useBayes <- plsda(trainDescr, trainMDRR, ncomp = 5, probMethod = "Bayes") useSoftmax <- plsda(trainDescr, trainMDRR, ncomp = 5) confusionMatrix(predict(useBayes, testDescr), testMDRR) confusionMatrix(predict(useSoftmax, testDescr), testMDRR) histogram(~predict(useBayes, testDescr, type = "prob")[,"Active",] | testMDRR, xlab = "Active Prob", xlim = c(-.1,1.1)) histogram(~predict(useSoftmax, testDescr, type = "prob")[,"Active",] | testMDRR, xlab = "Active Prob", xlim = c(-.1,1.1)) ## different sized objects are returned length(predict(useBayes, testDescr)) dim(predict(useBayes, testDescr, ncomp = 1:3)) dim(predict(useBayes, testDescr, type = "prob")) dim(predict(useBayes, testDescr, type = "prob", ncomp = 1:3)) ## Using spls: ## (As of 11/09, the spls package now has a similar function with ## the same mane. To avoid conflicts, use caret:::splsda to ## get this version) splsFit <- caret:::splsda(trainDescr, trainMDRR, K = 5, eta = .9, probMethod = "Bayes") confusionMatrix(caret:::predict.splsda(splsFit, testDescr), testMDRR) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.