Dxy and Mean Squared Error by Cross-validating a Tree Sequence
Uses xval
-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers' Dxy
rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score. For
survival trees, Dxy
is negated so that larger is better.
There are print
and plot
methods for
objects created by validate.rpart
.
# f <- rpart(formula=y ~ x1 + x2 + \dots) # or rpart ## S3 method for class 'rpart' validate(fit, method, B, bw, rule, type, sls, aics, force, estimates, pr=TRUE, k, rand, xval=10, FUN, ...) ## S3 method for class 'validate.rpart' print(x, ...) ## S3 method for class 'validate.rpart' plot(x, what=c("mse","dxy"), legendloc=locator, ...)
fit |
an object created by |
method,B,bw,rule,type,sls,aics,force,estimates |
are there only for consistency with the generic |
x |
the result of |
k |
a sequence of cost/complexity values. By default these are obtained
from calling |
rand |
a random sample (usually omitted) |
xval |
number of splits |
FUN |
the name of a function which produces a sequence of trees, such
|
... |
additional arguments to |
pr |
set to |
what |
a vector of things to plot. By default, 2 plots will be done, one for
|
legendloc |
a function that is evaluated with a single argument equal to |
a list of class "validate.rpart"
with components named k, size, dxy.app
,
dxy.val, mse.app, mse.val, binary, xval
. size
is the number of nodes,
dxy
refers to Somers' D
, mse
refers to mean squared error of prediction,
app
means apparent accuracy on training samples, val
means validated
accuracy on test samples, binary
is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary). size
will not be present if the user specifies k
.
prints if pr=TRUE
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
## Not run: n <- 100 set.seed(1) x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) y <- 1*(x1+x2+rnorm(n) > 1) table(y) require(rpart) f <- rpart(y ~ x1 + x2 + x3, model=TRUE) v <- validate(f) v # note the poor validation par(mfrow=c(1,2)) plot(v, legendloc=c(.2,.5)) par(mfrow=c(1,1)) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.