Split a survival data set at specified times
Given a survival data set and a set of specified cut times, split each record into multiple subrecords at each cut time. The new data set will be in ‘counting process’ format, with a start time, stop time, and event status for each record.
survSplit(formula, data, subset, na.action=na.pass, cut, start="tstart", id, zero=0, episode, end="tstop", event="event")
formula |
a model formula |
data |
a data frame |
subset, na.action |
rows of the data to be retained |
cut |
the vector of timepoints to cut at |
start |
character string with the name of a start time variable (will be created if needed) |
id |
character string with the name of new id variable to create (optional). This can be useful if the data set does not already contain an identifier. |
zero |
If |
episode |
character string with the name of new episode variable (optional) |
end |
character string with the name of event time variable |
event |
character string with the name of censoring indicator |
Each interval in the original data is cut at the given points; if an original row were (15, 60] with a cut vector of (10,30, 40) the resulting data set would have intervals of (15,30], (30,40] and (40, 60].
Each row in the final data set will lie completely within one of the
cut intervals. Which interval for each row of the output is shown by the
episode
variable, where 1= less than the first cutpoint, 2=
between the first and the second, etc.
For the example above the values would be 2, 3, and 4.
The routine is called with a formula as the first
argument.
The right hand side of the formula can be used to delimit variables
that should be retained; normally one will use ~ .
as a
shorthand to retain them all. The routine
will try to retain variable names, e.g. Surv(adam, joe, fred)~.
will result in a data set with those same variable names for
tstart
, end
, and event
options rather than
the defaults. Any user specified values for these options will be
used if they are present, of course.
However, the routine is not sophisticated; it only does this
substitution for simple names. A call of Surv(time, stat==2)
for instance will not retain "stat" as the name of the event variable.
Rows of data with a missing time or status are copied across
unchanged, unless the na.action argument is changed from its default
value of na.pass
. But in the latter case any row
that is missing for any variable will be removed, which is rarely
what is desired.
New, longer, data frame.
fit1 <- coxph(Surv(time, status) ~ karno + age + trt, veteran) plot(cox.zph(fit1)[1]) # a cox.zph plot of the data suggests that the effect of Karnofsky score # begins to diminish by 60 days and has faded away by 120 days. # Fit a model with separate coefficients for the three intervals. # vet2 <- survSplit(Surv(time, status) ~., veteran, cut=c(60, 120), episode ="timegroup") fit2 <- coxph(Surv(tstart, time, status) ~ karno* strata(timegroup) + age + trt, data= vet2) c(overall= coef(fit1)[1], t0_60 = coef(fit2)[1], t60_120= sum(coef(fit2)[c(1,4)]), t120 = sum(coef(fit2)[c(1,5)]))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.