Segmented relationships in regression models
Fits regression models with segmented relationships between the response and one or more explanatory variables. Break-point estimates are provided.
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, ...) ## Default S3 method: segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) ## S3 method for class 'lm' segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) ## S3 method for class 'glm' segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...) ## S3 method for class 'Arima' segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), model = TRUE, keep.class=FALSE, ...)
obj |
standard ‘linear’ model of class "lm", "glm" or "Arima". Since version 0.5.0-0 any regression fit may be supplied (see 'Details'). |
seg.Z |
the segmented variables(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as |
psi |
starting values for the breakpoints to be estimated. If there is a single segmented variable specified in |
npsi |
A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument |
fixed.psi |
An optional named list meaning the breakpoints to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in |
control |
a list of parameters for controlling the fitting process.
See the documentation for |
model |
logical value indicating if the model.frame should be returned. |
keep.class |
logical value indicating if the final fit returned by |
... |
optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via |
Given a linear regression model (usually of class "lm" or "glm"), segmented tries to estimate
a new regression model having broken-line relationships with the variables specified in seg.Z
.
A segmented (or broken-line) relationship is defined by the slope
parameters and the break-points where the linear relation changes. The number of breakpoints
of each segmented relationship is fixed via the psi
argument, where initial
values for the break-points must be specified. The model
is estimated simultaneously yielding point estimates and relevant approximate
standard errors of all the model parameters, including the break-points.
Since version 0.2-9.0 segmented
implements the bootstrap restarting algorithm described in Wood (2001).
The bootstrap restarting is expected to escape the local optima of the objective function when the
segmented relationship is flat and the log likelihood can have multiple local optima.
Since version 0.5-0.0 the default method segmented.default
has been added to estimate segmented relationships in
general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile).
The objective function to be minimized is the (minus) value extracted by the logLik
function or it may be passed on via
the fn.obj
argument in seg.control
. See example below. While the default method is expected to work with any regression
fit (where the usual coef()
, update()
, and logLik()
returns appropriate results), it is not recommended for
"lm" or "glm" fits (as segmented.default
is slower than the specific methods segmented.lm
and segmented.glm
), although
final results are the same. However the object returned by segmented.default
is not of class "segmented", as currently
the segmented methods are not guaranteed to work for ‘generic’ (i.e., besides "lm" and "glm") regression fits. The user
could try each "segmented" method on the returned object by calling it explicitly (e.g. via plot.segmented()
or confint.segmented()
wherein the regression coefficients and relevant covariance matrix have to be specified, see .coef
and .vcov
in plot.segmented()
, confint.segmented()
, slope()
).
The returned object depends on the last
component returned by seg.control
.
If last=TRUE, the default, segmented returns an object of class "segmented" which inherits
from the class "lm" or "glm" depending on the class of obj
. Otherwise a list is
returned, where the last component is the fitted model at the final iteration,
see seg.control
.
An object of class "segmented" is a list containing the components of the
original object obj
with additionally the followings:
psi |
estimated break-points and relevant (approximate) standard errors |
it |
number of iterations employed |
epsilon |
difference in the objective function when the algorithm stops |
model |
the model frame |
psi.history |
a list or a vector including the breakpoint estimates at each step |
seed |
the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed |
.. |
Other components are not of direct interest of the user |
It is well-known that the log-likelihood function for the
break-point may be not concave, especially
for poor clear-cut kink-relationships. In these circumstances the initial guess
for the break-point, i.e. the psi
argument, must be provided with care. For instance visual
inspection of a, possibly smoothed, scatter-plot is usually a good way to obtain some idea on breakpoint location.
However bootstrap restarting, implemented since version 0.2-9.0, is relatively more robust to starting values specified
in psi
. Alternatively an automatic procedure may be implemented by specifying psi=NA
and
fix.npsi=FALSE
in seg.control
: experience suggests to increase the number of iterations
via it.max
in seg.control()
. This automatic procedure, however, is expected to overestimate
the number of breakpoints.
The algorithm will start if the it.max
argument returned by seg.control
is greater than zero. If it.max=0
segmented
will estimate a new linear model with
break-point(s) fixed at the values reported in psi
.
In the returned fit object, ‘U.’ is put before the name of the segmented variable to mean the difference-in-slopes coefficient.
Methods specific to the class "segmented"
are
Others are inherited from the class "lm"
or "glm"
depending on the
class of obj
.
Vito M. R. Muggeo, vito.muggeo@unipa.it
Muggeo, V.M.R. (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22, 3055–3071.
Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.
set.seed(12) xx<-1:100 zz<-runif(100) yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2) dati<-data.frame(x=xx,y=yy,z=zz) out.lm<-lm(y~x,data=dati) #the simplest example: the starting model includes just 1 covariate #.. and 1 breakpoint has to be estimated for that o<-segmented(out.lm) #1 breakpoint for x #1 segmented variable not in the starting model and 1 breakpoint for that: #... you need to specify the variable via seg.Z, but no starting value for psi o<-segmented(out.lm,seg.Z=~z) #2 segmented variables, 1 breakpoint each (again no need to specify npsi or psi) o<-segmented(out.lm,seg.Z=~z+x) #1 segmented variable, 2 breakpoints: you have to specify starting values (vector) for psi: o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE)) #or by specifying just the *number* of breakpoints #o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE)) slope(o) #the slopes of the segmented relationship #2 segmented variables: starting values requested via a named list out.lm<-lm(y~z,data=dati) o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3)) #or by specifying just the *number* of breakpoints #o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1)) #the default method leads to the same results (but it is slower) #o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3)) #o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3), # control=seg.control(fn.obj="sum(x$residuals^2)")) #automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles) # Hint: increases number of iterations. Notice: bootstrap restart is not allowed! #o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3), # control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE)) #assess the progress of the breakpoint estimates throughout the iterations ## Not run: par(mfrow=c(1,2)) draw.history(o, "x") draw.history(o, "z") ## End(Not run) #try to increase the number of iterations and re-assess the #convergence diagnostics # An example using the Arima method: ## Not run: n<-50 idt <-1:n #the time index mu<-50-idt +1.5*pmax(idt-30,0) set.seed(6969) y<-mu+arima.sim(list(ar=.5),n)*3.5 o<-arima(y, c(1,0,0), xreg=idt) os1<-segmented(o, ~idt, control=seg.control(display=TRUE)) #note using the .coef argument is mandatory! slope(os1, .coef=os1$coef) plot(y) plot(os1, add=TRUE, .coef=os1$coef, col=2) ## End(Not run) ######Three examples using the default method: # 1. Cox regression with a segmented relationship ## Not run: library(survival) data(stanford2) o<-coxph(Surv(time, status)~age, data=stanford2) os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect summary(os) #actually it means summary.coxph(os) plot(os) #it does not work plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise line # 2. Linear mixed model via the nlme package dati$g<-gl(10,10) #the cluster 'id' variable library(nlme) o<-lme(y~x+z, random=~1|g, data=dati) os<-segmented.default(o, ~x+z, npsi=list(x=2, z=1)) #summarizing results (note the '.coef' argument) slope(os, .coef=fixef(os)) plot.segmented(os, "x", .coef=fixef(os), conf.level=.95) confint.segmented(os, "x", .coef=fixef(os)) # 3. segmented quantile regression via the quantreg package library(quantreg) data(Mammals) y<-with(Mammals, log(speed)) x<-with(Mammals, log(weight)) o<-rq(y~x, tau=.9) os<-segmented.default(o, ~x) #it does NOT work. It cannot computed the vcov matrix.. #Let's define the vcov.rq function.. (I don't know if it is the best option..) vcov.rq<-function(x,...) { V<-summary(x,cov=TRUE,se="nid",...)$cov rownames(V)<-colnames(V)<-names(x$coef) V} os<-segmented.default(o, ~x) #now it does work plot.segmented(os, res=TRUE, col=2, conf.level=.95) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.