Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

validate.cph

Validation of a Fitted Cox or Parametric Survival Model's Indexes of Fit


Description

This is the version of the validate function specific to models fitted with cph or psm. Also included is a small function dxy.cens that retrieves Dxy and its standard error from the survival package's survConcordance.fit function. This allows for incredibly fast computation of Dxy or the c-index even for hundreds of thousands of observations. dxy.cens negates Dxy if log relative hazard is being predicted. If y is a left-censored Surv object, times are negated and a right-censored object is created, then Dxy is negated.

Usage

# fit <- cph(formula=Surv(ftime,event) ~ terms, x=TRUE, y=TRUE, \dots)
## S3 method for class 'cph'
validate(fit, method="boot", B=40, bw=FALSE, rule="aic",
type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE,
pr=FALSE, dxy=TRUE, u, tol=1e-9, ...)

## S3 method for class 'psm'
validate(fit, method="boot",B=40,
        bw=FALSE, rule="aic", type="residual", sls=.05, aics=0,
        force=NULL, estimates=TRUE, pr=FALSE,
        dxy=TRUE, tol=1e-12, rel.tolerance=1e-5, maxiter=15, ...)

dxy.cens(x, y, type=c('time','hazard'))

Arguments

fit

a fit derived cph. The options x=TRUE and y=TRUE must have been specified. If the model contains any stratification factors and dxy=TRUE, the options surv=TRUE and time.inc=u must also have been given, where u is the same value of u given to validate.

method

see validate

B

number of repetitions. For method="crossvalidation", is the number of groups of omitted observations.

rel.tolerance,maxiter,bw

TRUE to do fast step-down using the fastbw function, for both the overall model and for each repetition. fastbw keeps parameters together that represent the same factor.

rule

Applies if bw=TRUE. "aic" to use Akaike's information criterion as a stopping rule (i.e., a factor is deleted if the chi-square falls below twice its degrees of freedom), or "p" to use P-values.

type

"residual" or "individual" - stopping rule is for individual factors or for the residual chi-square for all variables deleted. For dxy.cens, specify type="hazard" if x is on the hazard or cumulative hazard (or their logs) scale, causing negation of the correlation index.

sls

significance level for a factor to be kept in a model, or for judging the residual chi-square.

aics

cutoff on AIC when rule="aic".

force

see fastbw

estimates

see print.fastbw

pr

TRUE to print results of each repetition

tol,...

see validate or predab.resample

dxy

set to TRUE to validate Somers' Dxy using dxy.cens, which is fast until n > 500,000. Uses the survival package's survConcordance.fit service function for survConcordance.

u

must be specified if the model has any stratification factors and dxy=TRUE. In that case, strata are not included in X beta and the survival curves may cross. Predictions at time t=u are correlated with observed survival times. Does not apply to validate.psm.

x

a numeric vector

y

a Surv object that may be uncensored or right-censored

Details

Statistics validated include the Nagelkerke R^2, Dxy, slope shrinkage, the discrimination index D [(model L.R. chi-square - 1)/L], the unreliability index U = (difference in -2 log likelihood between uncalibrated X beta and X beta with overall slope calibrated to test sample) / L, and the overall quality index Q = D - U. g is the g-index on the log relative hazard (linear predictor) scale. L is -2 log likelihood with beta=0. The "corrected" slope can be thought of as shrinkage factor that takes into account overfitting. See predab.resample for the list of resampling methods.

Value

matrix with rows corresponding to Dxy, Slope, D, U, and Q, and columns for the original index, resample estimates, indexes applied to whole or omitted sample using model derived from resample, average optimism, corrected index, and number of successful resamples.

The values corresponding to the row Dxy are equal to 2 * (C - 0.5) where C is the C-index or concordance probability. If the user is correlating the linear predictor (predicted log hazard) with survival time, Dxy is automatically negated.

Side Effects

prints a summary, and optionally statistics for each re-fit (if pr=TRUE)

Author(s)

Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com

See Also

Examples

n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, TRUE))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
S <- Surv(dt,e)

f <- cph(S ~ age*sex, x=TRUE, y=TRUE)
# Validate full model fit
validate(f, B=10)               # normally B=150

# Validate a model with stratification.  Dxy is the only
# discrimination measure for such models, by Dxy requires
# one to choose a single time at which to predict S(t|X)
f <- cph(S ~ rcs(age)*strat(sex), 
         x=TRUE, y=TRUE, surv=TRUE, time.inc=2)
validate(f, u=2, B=10)   # normally B=150
# Note u=time.inc

rms

Regression Modeling Strategies

v6.2-0
GPL (>= 2)
Authors
Frank E Harrell Jr <fh@fharrell.com>
Initial release
2021-03-17

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.