Random Forest Cross-Valdidation for feature selection
This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.
rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)
trainx |
matrix or data frame containing columns of predictor variables |
trainy |
vector of response, must have length equal to the number
of rows in |
cv.fold |
number of folds in the cross-validation |
scale |
if |
step |
if |
mtry |
a function of number of remaining predictor variables to
use as the |
recursive |
whether variable importance is (re-)assessed at each step of variable reduction |
... |
other arguments passed on to |
A list with the following components:
list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)
n.var |
vector of number of variables used at each step |
error.cv |
corresponding vector of error rates or MSEs at each step |
predicted |
list of |
Andy Liaw
Svetnik, V., Liaw, A., Tong, C. and Wang, T., “Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules”, MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.
set.seed(647) myiris <- cbind(iris[1:4], matrix(runif(96 * nrow(iris)), nrow(iris), 96)) result <- rfcv(myiris, iris$Species, cv.fold=3) with(result, plot(n.var, error.cv, log="x", type="o", lwd=2)) ## The following can take a while to run, so if you really want to try ## it, copy and paste the code into R. ## Not run: result <- replicate(5, rfcv(myiris, iris$Species), simplify=FALSE) error.cv <- sapply(result, "[[", "error.cv") matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l", lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x", xlab="Number of variables", ylab="CV Error") ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.