Fit Lee-Carter-type models for rates to arbitrarily shaped observations of rates in a Lexis diagram.
The Lee-Carter model is originally defined as a model for rates
observed in A-sets (age by period) of a Lexis diagram, as
log(rate(x,t)) = a(x) + b(x)k(t), using one parameter per age(x) and
period(t). This function uses natural splines for a(), b() and k(),
placing knots for each effect such that the number of events is the
same between knots. Also fits the continuous time counterparts of all
models supported by the lca.rh
function from the
ilc
package (see details).
LCa.fit( data, A, P, D, Y, model = "APa", # or one of "ACa", "APaC", "APCa" or "APaCa" a.ref, # age reference for the interactions pi.ref = a.ref, # age reference for the period interaction ci.ref = a.ref, # age reference for the cohort interaction p.ref, # period reference for the interaction c.ref, # cohort reference for the interactions npar = c(a = 6, # no. knots for main age-effect p = 6, # no. knots for period-effect c = 6, # no. knots for cohort-effect pi = 6, # no. knots for age in the period interaction ci = 6), # no. knots for age in the cohort interaction VC = TRUE, # numerical calculation of the Hessian? alpha = 0.05, # 1 minus confidence level eps = 1e-6, # convergence criterion maxit = 100, # max. no iterations quiet = TRUE ) # cut the crap ## S3 method for class 'LCa' print( x, ... ) ## S3 method for class 'LCa' summary( object, show.est=FALSE, ... ) ## S3 method for class 'LCa' plot( x, ... ) ## S3 method for class 'LCa' predict( object, newdata, alpha = 0.05, level = 1-alpha, sim = ( "vcov" %in% names(object) ), ... )
data |
A data frame. Must have columns |
A |
Vector of ages (midpoint of observation). |
P |
Vector of period (midpoint of observation). |
D |
Vector of no. of events. |
Y |
Vector of person-time. Demographers would say "exposure", bewildering epidemiologists. |
a.ref |
Reference age for the age-interaction term(s) |
pi.ref |
Same, but specifically for the interaction with period. |
ci.ref |
Same, but specifically for the interaction with cohort. |
p.ref |
Reference period for the time-interaction term |
c.ref |
Reference period for the time-interaction term |
model |
Character, either |
npar |
A (possibly named) vector or list, with either the number of knots or
the actual vectors of knots for each term. If unnamed, components are
taken to be in the order (a,b,t), if the model is "APaCa" in the order
(a,p,c,pi,ci). If a vector, the three integers indicate the number of
knots for each term; these will be placed so that there is an equal
number of events ( |
VC |
Logical. Should the variance-covariance matrix of the parameters be computed by numerical differentiation? See details. |
alpha |
1 minus the confidence level used when calculating
confidence intervals for estimates in |
eps |
Convergence criterion for the deviance, we use the the relative difference between deviance from the two models fitted. |
maxit |
Maximal number of iterations. |
quiet |
Shall I shut up or talk extensively to you about iteration progression etc.? |
object |
An |
show.est |
Logical. Should the estimates be printed? |
x |
An |
newdata |
Prediction data frame, must have columns |
level |
Confidence level. |
sim |
Logical or numeric. If |
... |
Additional parameters passed on to the method. |
The Lee-Carter model is non-linear in age and time so does not fit
in the classical glm-Poisson framework. But for fixed b(x)
it
is a glm, and also for fixed a(x)
, k(t)
. The function
alternately fits the two versions until the same fit is produced (same
deviance).
The multiplicative age by period term could equally well have been a multiplicative age by cohort or even both. Thus the most extensive model has 5 continuous functions:
log( lambda(a,p)) = f(a) + b_p(a)k_p(p) + b_c(a)k_c(p-a)
Each of these is fitted by a natural spline, with knots placed at the
quantiles of the events along the age (a), calendar time (p) respective
cohort (p-a) scales. Alternatively the knots can be specified explicitly
in the argument npar
as a named list, where
a
refers to f(a),
p
refers to k_p(p),
c
refers to k_c(p-a),
pi
(p
eriod codeinteraction) refers to b_p(a)
and
ci
(c
ohort i
nteraction) refers to b_c(p-a).
The naming convention for the models is a capital P
and/or
C
if the effect is in the model followed by a lower case
a
if there is an interaction with age. Thus there are 5 different
models that can be fitted: APa
, ACa
, APaC
APCa
and APaCa
.
The standard errors of the parameters from the two separate model fits
in the iterations are however wrong; they are conditional on a subset
of the parameters having a fixed value. However, analytic calculation
of the Hessian is a bit of a nightmare, so this is done numerically
using the hessian
function from the numDeriv
package if
VC=TRUE
.
The coefficients and the variance-covariance matrix of these are used
in predict.LCa
for a parametric bootstrap (that is, a
simulation from a multivariate normal with mean equal to the parameter
estimates and variance as the estimated variance-covariance) to get
confidence intervals for the predictions if sim
is TRUE
— which it is by default if they are part of the object.
The plot
for LCa
objects merely produces between 3 and 5
panels showing each of the terms in the model. These are mainly for
preliminary inspection; real reporting of the effects should use
proper relative scaling of the effects.
LCa.fit
returns an object of class LCa
(smooth
effects L
ee-Ca
rter model); it is a list with the
following components:
model |
Character, either |
ax |
3-column matrix of age-effects, c.i. from the age-time model. Row names are the unique occurring ages in the dataset. Estimates are rates. |
pi |
3-column matrix of age-period interaction effects, c.i. from the age
model. Row names are the actually occurring ages in the
dataset. Estimates are multipliers of the log-RRs in |
kp |
3-column matrix of period-effects, with c.i.s from the
age-time model. Row names are the actually occurring times in the
dataset. Estimates are rate-ratios centered at 1 at |
ci |
3-column matrix of age-cohort interaction effects, c.i. from the age
model. Row names are the actually occurring ages in the
dataset. Estimates are multipliers of the log-RRs in |
kc |
3-column matrix of cohort-effects, with c.i.s from the age-time
model. Row names are the actually occurring times in the
dataset. Estimates are rate-ratios centered at 1 at |
mod.at |
|
mod.b |
|
coef |
All coefficients from both models, in the order |
vcov |
Variance-covariance matrix of coefficients from both
models, in the same order as in the |
knots |
List of vectors of knots used in for the age, period and cohort effects. |
refs |
List of reference points used for the age, period and cohort terms in the interactions. |
deviance |
Deviance of the model |
df.residual |
Residual degrees of freedom |
iter |
Number of iterations used to reach convergence. |
plot.LCa
plots the estimated effects in separate panels,
using a log-scale for the baseline rates (ax
) and the time-RR
(kt
). For the APaCa
model 5 panels are plotted.
summary.LCa
returns (invisibly) a matrix with the parameters
from the models and a column of the conditional se.s and additionally
of the se.s derived from the numerically computed Hessian (if
LCa.fit
were called with VC=TRUE
.)
predict.LCa
returns a matrix with one row per row in
newdata
. If LCa.fit
were called with VC=TRUE
there will be 3 columns, namely prediction (1st column) and c.i.s
based on a simulation of parameters from a multivariate normal with
mean coef
and variance vcov
using the median and
alpha
/2 quantiles from the sim
simulations. If
LCa.fit
were called with VC=FALSE
there will be 6
columns, namely estimates and c.i.s from age-time model
(mod.at
), and from the age-interaction model (mod.b
),
both using conditional variances, and therefore likely with too narrow
confidence limits.
Bendix Carstensen, http://bendixcarstensen.com
This function was conceived while teaching a course on APC models at the Max Planck Institute of Demographic Research (MPIDR, https://www.demogr.mpg.de/en/) in Rostock in May 2016 (http://bendixcarstensen.com/APC/MPIDR-2016/), and finished during a week long research stay there, kindly sponsored by the MPIDR.
library( Epi ) # Load the testis cancer data by Lexis triangles data( testisDK ) tc <- subset( testisDK, A>14 & A<60 ) head( tc ) # We want to see rates per 100,000 PY tc$Y <- tc$Y / 10^5 # Fit the Lee-Carter model with age-period interaction (default) LCa.tc <- LCa.fit( tc, model="ACa", a.ref=30, p.ref=1980, quiet=FALSE, eps=10e-4, maxit=50 ) LCa.tc summary( LCa.tc ) # Inspect what we got names( LCa.tc ) # show the estimated effects par( mfrow=c(1,3) ) plot( LCa.tc ) # Prediction data frame for ages 15 to 60 for two time points: nd <- data.frame( A=15:60 ) # LCa predictions p70 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1970), sim=1000 ) p90 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1990), sim=1000 ) # Inspect the curves from the parametric bootstrap (simulation): par( mfrow=c(1,1) ) head( cbind(p70,p90) ) matplot( nd$A, cbind(p70,p90), type="l", lwd=c(6,3,3), lty=c(1,3,3), col=rep( 2:3, each=3 ), log="y", ylab="Testis cancer incidence per 100,000 PY in 1970 resp. 1990", xlab="Age" )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.