rms: validate.rpart – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

validate.rpart

Dxy and Mean Squared Error by Cross-validating a Tree Sequence

Description

Uses xval-fold cross-validation of a sequence of trees to derive estimates of the mean squared error and Somers' Dxy rank correlation between predicted and observed responses. In the case of a binary response variable, the mean squared error is the Brier accuracy score. For survival trees, Dxy is negated so that larger is better. There are print and plot methods for objects created by validate.rpart.

Usage

# f <- rpart(formula=y ~ x1 + x2 + \dots) # or rpart
## S3 method for class 'rpart'
validate(fit, method, B, bw, rule, type, sls, aics,
    force, estimates, pr=TRUE,
    k, rand, xval=10, FUN, ...)
## S3 method for class 'validate.rpart'
print(x, ...)
## S3 method for class 'validate.rpart'
plot(x, what=c("mse","dxy"), legendloc=locator, ...)

Arguments

`fit`	an object created by `rpart`. You must have specified the `model=TRUE` argument to `rpart`.
`method,B,bw,rule,type,sls,aics,force,estimates`	are there only for consistency with the generic `validate` function; these are ignored
`x`	the result of `validate.rpart`
`k`	a sequence of cost/complexity values. By default these are obtained from calling `FUN` with no optional arguments or from the `rpart` `cptable` object in the original fit object. You may also specify a scalar or vector.
`rand`	a random sample (usually omitted)
`xval`	number of splits
`FUN`	the name of a function which produces a sequence of trees, such `prune`.
`...`	additional arguments to `FUN` (ignored by `print,plot`).
`pr`	set to `FALSE` to prevent intermediate results for each `k` to be printed
`what`	a vector of things to plot. By default, 2 plots will be done, one for `mse` and one for `Dxy`.
`legendloc`	a function that is evaluated with a single argument equal to `1` to generate a list with components `x, y` specifying coordinates of the upper left corner of a legend, or a 2-vector. For the latter, `legendloc` specifies the relative fraction of the plot at which to center the legend.

Value

a list of class "validate.rpart" with components named k, size, dxy.app, dxy.val, mse.app, mse.val, binary, xval. size is the number of nodes, dxy refers to Somers' D, mse refers to mean squared error of prediction, app means apparent accuracy on training samples, val means validated accuracy on test samples, binary is a logical variable indicating whether or not the response variable was binary (a logical or 0/1 variable is binary). size will not be present if the user specifies k.

Side Effects

prints if pr=TRUE

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Examples

## Not run: 
n <- 100
set.seed(1)
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
y  <- 1*(x1+x2+rnorm(n) > 1)
table(y)
require(rpart)
f <- rpart(y ~ x1 + x2 + x3, model=TRUE)
v <- validate(f)
v    # note the poor validation
par(mfrow=c(1,2))
plot(v, legendloc=c(.2,.5))
par(mfrow=c(1,1))

## End(Not run)

rms

Regression Modeling Strategies

v6.2-0

GPL (>= 2)

Authors

Frank E Harrell Jr <fh@fharrell.com>

Initial release

2021-03-17