Cox Proportional Hazards Model and Extensions
Modification of Therneau's coxph
function to fit the Cox model and
its extension, the Andersen-Gill model. The latter allows for interval
time-dependent covariables, time-dependent strata, and repeated events.
The Survival
method for an object created by cph
returns an S
function for computing estimates of the survival function.
The Quantile
method for cph
returns an S function for computing
quantiles of survival time (median, by default).
The Mean
method returns a function for computing the mean survival
time. This function issues a warning if the last follow-up time is uncensored,
unless a restricted mean is explicitly requested.
cph(formula = formula(data), data=environment(formula), weights, subset, na.action=na.delete, method=c("efron","breslow","exact","model.frame","model.matrix"), singular.ok=FALSE, robust=FALSE, model=FALSE, x=FALSE, y=FALSE, se.fit=FALSE, linear.predictors=TRUE, residuals=TRUE, nonames=FALSE, eps=1e-4, init, iter.max=10, tol=1e-9, surv=FALSE, time.inc, type=NULL, vartype=NULL, debug=FALSE, ...) ## S3 method for class 'cph' Survival(object, ...) # Evaluate result as g(times, lp, stratum=1, type=c("step","polygon")) ## S3 method for class 'cph' Quantile(object, ...) # Evaluate like h(q, lp, stratum=1, type=c("step","polygon")) ## S3 method for class 'cph' Mean(object, method=c("exact","approximate"), type=c("step","polygon"), n=75, tmax, ...) # E.g. m(lp, stratum=1, type=c("step","polygon"), tmax, \dots)
formula |
an S formula object with a |
object |
an object created by |
data |
name of an S data frame containing all needed variables. Omit this to use a data frame already in the S “search list”. |
weights |
case weights |
subset |
an expression defining a subset of the observations to use in the fit. The default
is to use all observations. Specify for example |
na.action |
specifies an S function to handle missing data. The default is the function |
method |
for For |
singular.ok |
If |
robust |
if |
model |
default is |
x |
default is |
y |
default is |
se.fit |
default is |
linear.predictors |
set to |
residuals |
set to |
nonames |
set to |
eps |
convergence criterion - change in log likelihood. |
init |
vector of initial parameter estimates. Defaults to all zeros.
Special residuals can be obtained by setting some elements of |
iter.max |
maximum number of iterations to allow. Set to |
tol |
tolerance for declaring singularity for matrix inversion (available only when survival5 or later package is in effect) |
surv |
set to |
time.inc |
time increment used in deriving |
type |
(for For |
vartype |
see |
debug |
set to |
... |
other arguments passed to |
times |
a scalar or vector of times at which to evaluate the survival estimates |
lp |
a scalar or vector of linear predictors (including the centering constant) at which to evaluate the survival estimates |
stratum |
a scalar stratum number or name (e.g., |
q |
a scalar quantile or a vector of quantiles to compute |
n |
the number of points at which to evaluate the mean survival time, for
|
tmax |
For |
If there is any strata by covariable interaction in the model such that
the mean X beta varies greatly over strata, method="approximate"
may
not yield very accurate estimates of the mean in Mean.cph
.
For method="approximate"
if you ask for an estimate of the mean for
a linear predictor value that was outside the range of linear predictors
stored with the fit, the mean for that observation will be NA
.
For Survival
, Quantile
, or Mean
, an S function is returned. Otherwise,
in addition to what is listed below, formula/design information and
the components
maxtime, time.inc, units, model, x, y, se.fit
are stored, the last 5
depending on the settings of options by the same names.
The vectors or matrix stored if y=TRUE
or x=TRUE
have rows deleted according to subset
and
to missing data, and have names or row names that come from the
data frame used as input data.
n |
table with one row per stratum containing number of censored and uncensored observations |
coef |
vector of regression coefficients |
stats |
vector containing the named elements |
var |
variance/covariance matrix of coefficients |
linear.predictors |
values of predicted X beta for observations used in fit, normalized to have overall mean zero, then having any offsets added |
resid |
martingale residuals |
loglik |
log likelihood at initial and final parameter values |
score |
value of score statistic at initial values of parameters |
times |
lists of times (if |
surv |
lists of underlying survival probability estimates |
std.err |
lists of standard errors of estimate log-log survival |
surv.summary |
a 3 dimensional array if |
center |
centering constant, equal to overall mean of X beta. |
Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com
coxph
, survival-internal
,
Surv
, residuals.cph
,
cox.zph
, survfit.cph
,
survest.cph
, survfit.coxph
,
survplot
, datadist
,
rms
, rms.trans
, anova.rms
,
summary.rms
, Predict
,
fastbw
, validate
, calibrate
,
plot.Predict
, ggplot.Predict
,
specs.rms
, lrm
, which.influence
,
na.delete
,
na.detail.response
, print.cph
,
latex.cph
, vif
, ie.setup
,
GiniMd
, dxy.cens
,
survConcordance
# Simulate data from a population model in which the log hazard # function is linear in age and there is no age x sex interaction n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h label(dt) <- 'Follow-up Time' e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" dd <- datadist(age, sex) options(datadist='dd') S <- Surv(dt,e) f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE) cox.zph(f, "rank") # tests of PH anova(f) ggplot(Predict(f, age, sex)) # plot age effect, 2 curves for 2 sexes survplot(f, sex) # time on x-axis, curves for x2 res <- resid(f, "scaledsch") time <- as.numeric(dimnames(res)[[1]]) z <- loess(res[,4] ~ time, span=0.50) # residuals for sex plot(time, fitted(z)) lines(supsmu(time, res[,4]),lty=2) plot(cox.zph(f,"identity")) #Easier approach for last few lines # latex(f) f <- cph(S ~ age + strat(sex), surv=TRUE) g <- Survival(f) # g is a function g(seq(.1,1,by=.1), stratum="sex=Male", type="poly") #could use stratum=2 med <- Quantile(f) plot(Predict(f, age, fun=function(x) med(lp=x))) #plot median survival # Fit a model that is quadratic in age, interacting with sex as strata # Compare standard errors of linear predictor values with those from # coxph # Use more stringent convergence criteria to match with coxph f <- cph(S ~ pol(age,2)*strat(sex), x=TRUE, eps=1e-9, iter.max=20) coef(f) se <- predict(f, se.fit=TRUE)$se.fit require(lattice) xyplot(se ~ age | sex, main='From cph') a <- c(30,50,70) comb <- data.frame(age=rep(a, each=2), sex=rep(levels(sex), 3)) p <- predict(f, comb, se.fit=TRUE) comb$yhat <- p$linear.predictors comb$se <- p$se.fit z <- qnorm(.975) comb$lower <- p$linear.predictors - z*p$se.fit comb$upper <- p$linear.predictors + z*p$se.fit comb age2 <- age^2 f2 <- coxph(S ~ (age + age2)*strata(sex)) coef(f2) se <- predict(f2, se.fit=TRUE)$se.fit xyplot(se ~ age | sex, main='From coxph') comb <- data.frame(age=rep(a, each=2), age2=rep(a, each=2)^2, sex=rep(levels(sex), 3)) p <- predict(f2, newdata=comb, se.fit=TRUE) comb$yhat <- p$fit comb$se <- p$se.fit comb$lower <- p$fit - z*p$se.fit comb$upper <- p$fit + z*p$se.fit comb # g <- cph(Surv(hospital.charges) ~ age, surv=TRUE) # Cox model very useful for analyzing highly skewed data, censored or not # m <- Mean(g) # m(0) # Predicted mean charge for reference age #Fit a time-dependent covariable representing the instantaneous effect #of an intervening non-fatal event rm(age) set.seed(121) dframe <- data.frame(failure.time=1:10, event=rep(0:1,5), ie.time=c(NA,1.5,2.5,NA,3,4,NA,5,5,5), age=sample(40:80,10,rep=TRUE)) z <- ie.setup(dframe$failure.time, dframe$event, dframe$ie.time) S <- z$S ie.status <- z$ie.status attach(dframe[z$subs,]) # replicates all variables f <- cph(S ~ age + ie.status, x=TRUE, y=TRUE) #Must use x=TRUE,y=TRUE to get survival curves with time-dep. covariables #Get estimated survival curve for a 50-year old who has an intervening #non-fatal event at 5 days new <- data.frame(S=Surv(c(0,5), c(5,999), c(FALSE,FALSE)), age=rep(50,2), ie.status=c(0,1)) g <- survfit(f, new) plot(c(0,g$time), c(1,g$surv[,2]), type='s', xlab='Days', ylab='Survival Prob.') # Not certain about what columns represent in g$surv for survival5 # but appears to be for different ie.status #or: #g <- survest(f, new) #plot(g$time, g$surv, type='s', xlab='Days', ylab='Survival Prob.') #Compare with estimates when there is no intervening event new2 <- data.frame(S=Surv(c(0,5), c(5, 999), c(FALSE,FALSE)), age=rep(50,2), ie.status=c(0,0)) g2 <- survfit(f, new2) lines(c(0,g2$time), c(1,g2$surv[,2]), type='s', lty=2) #or: #g2 <- survest(f, new2) #lines(g2$time, g2$surv, type='s', lty=2) detach("dframe[z$subs, ]") options(datadist=NULL)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.