Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

validate.rpart

Dxy and Mean Squared Error by Cross-validating a Tree Sequence


Description

Uses xval-fold cross-validation of a sequence of trees to derive estimates of the mean squared error and Somers' Dxy rank correlation between predicted and observed responses. In the case of a binary response variable, the mean squared error is the Brier accuracy score. For survival trees, Dxy is negated so that larger is better. There are print and plot methods for objects created by validate.rpart.

Usage

# f <- rpart(formula=y ~ x1 + x2 + \dots) # or rpart
## S3 method for class 'rpart'
validate(fit, method, B, bw, rule, type, sls, aics,
    force, estimates, pr=TRUE,
    k, rand, xval=10, FUN, ...)
## S3 method for class 'validate.rpart'
print(x, ...)
## S3 method for class 'validate.rpart'
plot(x, what=c("mse","dxy"), legendloc=locator, ...)

Arguments

fit

an object created by rpart. You must have specified the model=TRUE argument to rpart.

method,B,bw,rule,type,sls,aics,force,estimates

are there only for consistency with the generic validate function; these are ignored

x

the result of validate.rpart

k

a sequence of cost/complexity values. By default these are obtained from calling FUN with no optional arguments or from the rpart cptable object in the original fit object. You may also specify a scalar or vector.

rand

a random sample (usually omitted)

xval

number of splits

FUN

the name of a function which produces a sequence of trees, such prune.

...

additional arguments to FUN (ignored by print,plot).

pr

set to FALSE to prevent intermediate results for each k to be printed

what

a vector of things to plot. By default, 2 plots will be done, one for mse and one for Dxy.

legendloc

a function that is evaluated with a single argument equal to 1 to generate a list with components x, y specifying coordinates of the upper left corner of a legend, or a 2-vector. For the latter, legendloc specifies the relative fraction of the plot at which to center the legend.

Value

a list of class "validate.rpart" with components named k, size, dxy.app, dxy.val, mse.app, mse.val, binary, xval. size is the number of nodes, dxy refers to Somers' D, mse refers to mean squared error of prediction, app means apparent accuracy on training samples, val means validated accuracy on test samples, binary is a logical variable indicating whether or not the response variable was binary (a logical or 0/1 variable is binary). size will not be present if the user specifies k.

Side Effects

prints if pr=TRUE

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

See Also

Examples

## Not run: 
n <- 100
set.seed(1)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y  <- 1*(x1+x2+rnorm(n) > 1)
table(y)
require(rpart)
f <- rpart(y ~ x1 + x2 + x3, model=TRUE)
v <- validate(f)
v    # note the poor validation
par(mfrow=c(1,2))
plot(v, legendloc=c(.2,.5))
par(mfrow=c(1,1))

## End(Not run)

rms

Regression Modeling Strategies

v6.2-0
GPL (>= 2)
Authors
Frank E Harrell Jr <fh@fharrell.com>
Initial release
2021-03-17

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.