Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

classDist

Compute and predict the distances to class centroids


Description

This function computes the class centroids and covariance matrix for a training set for determining Mahalanobis distances of samples to each class centroid.

Usage

classDist(x, ...)

## Default S3 method:
classDist(x, y, groups = 5, pca = FALSE, keep = NULL, ...)

## S3 method for class 'classDist'
predict(object, newdata, trans = log, ...)

Arguments

x

a matrix or data frame of predictor variables

...

optional arguments to pass (not currently used)

y

a numeric or factor vector of class labels

groups

an integer for the number of bins for splitting a numeric outcome

pca

a logical: should principal components analysis be applied to the dataset prior to splitting the data by class?

keep

an integer for the number of PCA components that should by used to predict new samples (NULL uses all within a tolerance of sqrt(.Machine$double.eps))

object

an object of class classDist

newdata

a matrix or data frame. If vars was previously specified, these columns should be in newdata

trans

an optional function that can be applied to each class distance. trans = NULL will not apply a function

Details

For factor outcomes, the data are split into groups for each class and the mean and covariance matrix are calculated. These are then used to compute Mahalanobis distances to the class centers (using predict.classDist The function will check for non-singular matrices.

For numeric outcomes, the data are split into roughly equal sized bins based on groups. Percentiles are used to split the data.

Value

for classDist, an object of class classDist with elements:

values

a list with elements for each class. Each element contains a mean vector for the class centroid and the inverse of the class covariance matrix

classes

a character vector of class labels

pca

the results of prcomp when pca = TRUE

call

the function call

p

the number of variables

n

a vector of samples sizes per class

For predict.classDist, a matrix with columns for each class. The columns names are the names of the class with the prefix dist.. In the case of numeric y, the class labels are the percentiles. For example, of groups = 9, the variable names would be dist.11.11, dist.22.22, etc.

Author(s)

Max Kuhn

References

Forina et al. CAIMAN brothers: A family of powerful classification and class modeling techniques. Chemometrics and Intelligent Laboratory Systems (2009) vol. 96 (2) pp. 239-245

See Also

Examples

trainSet <- sample(1:150, 100)

distData <- classDist(iris[trainSet, 1:4],
                      iris$Species[trainSet])

newDist <- predict(distData,
                   iris[-trainSet, 1:4])

splom(newDist, groups = iris$Species[-trainSet])

caret

Classification and Regression Training

v6.0-86
GPL (>= 2)
Authors
Max Kuhn [aut, cre], Jed Wing [ctb], Steve Weston [ctb], Andre Williams [ctb], Chris Keefer [ctb], Allan Engelhardt [ctb], Tony Cooper [ctb], Zachary Mayer [ctb], Brenton Kenkel [ctb], R Core Team [ctb], Michael Benesty [ctb], Reynald Lescarbeau [ctb], Andrew Ziem [ctb], Luca Scrucca [ctb], Yuan Tang [ctb], Can Candan [ctb], Tyler Hunt [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.