Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

rfcv

Random Forest Cross-Valdidation for feature selection


Description

This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.

Usage

rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5,
     mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)

Arguments

trainx

matrix or data frame containing columns of predictor variables

trainy

vector of response, must have length equal to the number of rows in trainx

cv.fold

number of folds in the cross-validation

scale

if "log", reduce a fixed proportion (step) of variables at each step, otherwise reduce step variables at a time

step

if log=TRUE, the fraction of variables to remove at each step, else remove this many variables at a time

mtry

a function of number of remaining predictor variables to use as the mtry parameter in the randomForest call

recursive

whether variable importance is (re-)assessed at each step of variable reduction

...

other arguments passed on to randomForest

Value

A list with the following components:

list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)

n.var

vector of number of variables used at each step

error.cv

corresponding vector of error rates or MSEs at each step

predicted

list of n.var components, each containing the predicted values from the cross-validation

Author(s)

Andy Liaw

References

Svetnik, V., Liaw, A., Tong, C. and Wang, T., “Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules”, MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.

See Also

Examples

set.seed(647)
myiris <- cbind(iris[1:4], matrix(runif(96 * nrow(iris)), nrow(iris), 96))
result <- rfcv(myiris, iris$Species, cv.fold=3)
with(result, plot(n.var, error.cv, log="x", type="o", lwd=2))

## The following can take a while to run, so if you really want to try
## it, copy and paste the code into R.

## Not run: 
result <- replicate(5, rfcv(myiris, iris$Species), simplify=FALSE)
error.cv <- sapply(result, "[[", "error.cv")
matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l",
        lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x",
        xlab="Number of variables", ylab="CV Error")

## End(Not run)

randomForest

Breiman and Cutler's Random Forests for Classification and Regression

v4.6-14
GPL (>= 2)
Authors
Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener.
Initial release
2018-03-22

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.