Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

stepclass

Stepwise variable selection for classification


Description

Forward/backward variable selection for classification using any specified classification function and selecting by estimated classification performance measure from ucpm.

Usage

stepclass(x, ...)

## Default S3 method:
stepclass(x, grouping, method, improvement = 0.05, maxvar = Inf, 
    start.vars = NULL, direction = c("both", "forward", "backward"), 
    criterion = "CR",  fold = 10, cv.groups = NULL, output = TRUE, 
    min1var = TRUE, ...)
## S3 method for class 'formula'
stepclass(formula, data, method, ...)

Arguments

x

matrix or data frame containing the explanatory variables (required, if formula is not given).

formula

A formula of the form groups ~ x1 + x2 + .... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. Interaction terms are not supported.

data

data matrix (rows=cases, columns=variables)

grouping

class indicator vector (a factor)

method

character, name of classification function (e.g. “lda”).

improvement

least improvement of performance measure desired to include or exclude any variable (<=1)

maxvar

maximum number of variables in model

start.vars

set variables to start with (indices or names). Default is no variables if ‘direction’ is “forward” or “both”, and all variables if ‘direction’ is “backward”.

direction

forward”, “backward” or “both” (default)

criterion

performance measure taken from ucpm.

fold

parameter for cross-validation; omitted if ‘cv.groups’ is specified.

cv.groups

vector of group indicators for cross-validation. By default assigned automatically.

output

indicator (logical) for textoutput during computation (slows down computation!)

min1var

logical, whether to include at least one variable in the model, even if the prior itself already is a reasonable model.

...

further parameters passed to classification function (‘method’), e.g. priors etc.

Details

The classification “method” (e.g. ‘lda’) must have its own ‘predict’ method (like ‘predict.lda’ for ‘lda’) that either returns a matrix of posterior probabilities or a list with an element ‘posterior’ containing that matrix instead. It must be able to deal with matrices as in method(x, grouping, ...)

Then a stepwise variable selection is performed. The initial model is defined by the provided starting variables; in every step new models are generated by including every single variable that is not in the model, and by excluding every single variable that is in the model. The resulting performance measure for these models are estimated (by cross-validation), and if the maximum value of the chosen criterion is better than ‘improvement’ plus the value so far, the corresponding variable is in- or excluded. The procedure stops, if the new best value is not good enough, or if the specified maximum number of variables is reached.

If ‘direction’ is “forward”, the model is only extended (by including further variables), if ‘direction’ is “backward”, the model is only reduced (by excluding variables from the model).

Value

An object of class ‘stepclass’ containing the following components:

call

the (matched) function call.

method

name of classification function used (e.g. “lda”).

start.variables

vector of starting variables.

process

data frame showing selection process (included/excluded variables and performance measure).

model

the final model: data frame with 2 columns; indices and names of variables.

perfomance.measure

value of the criterion used by ucpm

formula

formula of the form ‘response ~ list + of + selected + variables

Author(s)

Christian Röver, roever@statistik.tu-dortmund.de, Irina Czogiel

See Also

step, stepAIC, and greedy.wilks for stepwise variable selection according to Wilk's lambda

Examples

data(iris)
library(MASS)
iris.d <- iris[,1:4]  # the data    
iris.c <- iris[,5]    # the classes 
sc_obj <- stepclass(iris.d, iris.c, "lda", start.vars = "Sepal.Width")
sc_obj
plot(sc_obj)

## or using formulas:
sc_obj <- stepclass(Species ~ ., data = iris, method = "qda", 
    start.vars = "Sepal.Width", criterion = "AS")  # same as above 
sc_obj
## now you can say stuff like
## qda(sc_obj$formula, data = B3)

klaR

Classification and Visualization

v0.6-15
GPL-2 | GPL-3
Authors
Christian Roever, Nils Raabe, Karsten Luebke, Uwe Ligges, Gero Szepannek, Marc Zentgraf, David Meyer
Initial release
2020-02-18

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.