segmented: segmented – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Segmented relationships in regression models

Description

Fits regression models with segmented relationships between the response and one or more explanatory variables. Break-point estimates are provided.

Usage

segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(), 
    model = TRUE, ...)

## Default S3 method:
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)

## S3 method for class 'lm'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)

## S3 method for class 'glm'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)

## S3 method for class 'Arima'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
    model = TRUE, keep.class=FALSE, ...)

Arguments

`obj`	standard ‘linear’ model of class "lm", "glm" or "Arima". Since version 0.5.0-0 any regression fit may be supplied (see 'Details').
`seg.Z`	the segmented variables(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as `seg.Z=~x` or `seg.Z=~x1+x2`. It can be missing when `obj` includes only one covariate which is taken as segmented variable. Currently, formulas involving functions, such as `seg.Z=~log(x1)`, or selection operators, such as `seg.Z=~d[,"x1"]` or `seg.Z=~d$x1`, are not allowed. Also, variable names formed by `U` or `V` only (with or without numbers ) are not permitted.
`psi`	starting values for the breakpoints to be estimated. If there is a single segmented variable specified in `seg.Z`, `psi` is a numeric vector, and it can be missing when 1 breakpoint has to be estimated (and the median of the segmented variable is used as a starting value). If `seg.Z` includes several covariates, `psi` has be specified as a named list of vectors whose names have to match the variables in the `seg.Z` argument. Each vector of such list includes starting values for the break-point(s) for the corresponding variable in `seg.Z`. A `NA` value means that '`K`' quantiles (or equally spaced values) are used as starting values; `K` is fixed via the `seg.control` auxiliary function.
`npsi`	A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument `quant` in `seg.control`. `npsi` can be missing and `npsi=1` is assumed for all variables specified in `seg.Z`. If `psi` is provided, `npsi` is ignored.
`fixed.psi`	An optional named list meaning the breakpoints to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in `seg.Z`. If there is a single variable in `seg.Z`, a simple nmeric vector can be specified.
`control`	a list of parameters for controlling the fitting process. See the documentation for `seg.control` for details.
`model`	logical value indicating if the model.frame should be returned.
`keep.class`	logical value indicating if the final fit returned by `segmented.default` should keep the class '`segmented`' (along with the class of the original fit `obj`). Ignored by the segmented methods.
`...`	optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via `lm` or `glm` for instance), such as `weights` or `offet`, have to be included in the starting model `obj`

Details

Given a linear regression model (usually of class "lm" or "glm"), segmented tries to estimate a new regression model having broken-line relationships with the variables specified in seg.Z. A segmented (or broken-line) relationship is defined by the slope parameters and the break-points where the linear relation changes. The number of breakpoints of each segmented relationship is fixed via the psi argument, where initial values for the break-points must be specified. The model is estimated simultaneously yielding point estimates and relevant approximate standard errors of all the model parameters, including the break-points.

Since version 0.2-9.0 segmented implements the bootstrap restarting algorithm described in Wood (2001). The bootstrap restarting is expected to escape the local optima of the objective function when the segmented relationship is flat and the log likelihood can have multiple local optima.

Since version 0.5-0.0 the default method segmented.default has been added to estimate segmented relationships in general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile). The objective function to be minimized is the (minus) value extracted by the logLik function or it may be passed on via the fn.obj argument in seg.control. See example below. While the default method is expected to work with any regression fit (where the usual coef(), update(), and logLik() returns appropriate results), it is not recommended for "lm" or "glm" fits (as segmented.default is slower than the specific methods segmented.lm and segmented.glm), although final results are the same. However the object returned by segmented.default is not of class "segmented", as currently the segmented methods are not guaranteed to work for ‘generic’ (i.e., besides "lm" and "glm") regression fits. The user could try each "segmented" method on the returned object by calling it explicitly (e.g. via plot.segmented() or confint.segmented() wherein the regression coefficients and relevant covariance matrix have to be specified, see .coef and .vcov in plot.segmented(), confint.segmented(), slope()).

Value

The returned object depends on the last component returned by seg.control. If last=TRUE, the default, segmented returns an object of class "segmented" which inherits from the class "lm" or "glm" depending on the class of obj. Otherwise a list is returned, where the last component is the fitted model at the final iteration, see seg.control.

An object of class "segmented" is a list containing the components of the original object obj with additionally the followings:

`psi`	estimated break-points and relevant (approximate) standard errors
`it`	number of iterations employed
`epsilon`	difference in the objective function when the algorithm stops
`model`	the model frame
`psi.history`	a list or a vector including the breakpoint estimates at each step
`seed`	the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed
`..`	Other components are not of direct interest of the user

Warning

It is well-known that the log-likelihood function for the break-point may be not concave, especially for poor clear-cut kink-relationships. In these circumstances the initial guess for the break-point, i.e. the psi argument, must be provided with care. For instance visual inspection of a, possibly smoothed, scatter-plot is usually a good way to obtain some idea on breakpoint location. However bootstrap restarting, implemented since version 0.2-9.0, is relatively more robust to starting values specified in psi. Alternatively an automatic procedure may be implemented by specifying psi=NA and fix.npsi=FALSE in seg.control: experience suggests to increase the number of iterations via it.max in seg.control(). This automatic procedure, however, is expected to overestimate the number of breakpoints.

Note

The algorithm will start if the it.max argument returned by seg.control is greater than zero. If it.max=0 segmented will estimate a new linear model with break-point(s) fixed at the values reported in psi.
In the returned fit object, ‘U.’ is put before the name of the segmented variable to mean the difference-in-slopes coefficient.
Methods specific to the class "segmented" are
- print.segmented
- summary.segmented
- print.summary.segmented
- plot.segmented
- lines.segmented
- confint.segmented
- vcov.segmented
- predict.segmented
- points.segmented
- coef.segmented
Others are inherited from the class "lm" or "glm" depending on the class of obj.

Author(s)

Vito M. R. Muggeo, vito.muggeo@unipa.it

References

Muggeo, V.M.R. (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22, 3055–3071.

Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.

Examples

set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)

#the simplest example: the starting model includes just 1 covariate 
#.. and 1 breakpoint has to be estimated for that
o<-segmented(out.lm) #1 breakpoint for x

#1 segmented variable not in the starting model and 1 breakpoint for that:
#... you need to specify the variable via seg.Z, but no starting value for psi
o<-segmented(out.lm,seg.Z=~z)

#2 segmented variables, 1 breakpoint each (again no need to specify npsi or psi)
o<-segmented(out.lm,seg.Z=~z+x)


#1 segmented variable, 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE))

#or by specifying just the *number* of breakpoints
#o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE)) 

slope(o) #the slopes of the segmented relationship


#2 segmented variables: starting values requested via a named list
out.lm<-lm(y~z,data=dati)
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#or by specifying just the *number* of breakpoints
#o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1))



#the default method leads to the same results (but it is slower)
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3), 
#    control=seg.control(fn.obj="sum(x$residuals^2)"))


#automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles)
# Hint: increases number of iterations. Notice: bootstrap restart is not allowed!
#o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3), 
#    control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE))

#assess the progress of the breakpoint estimates throughout the iterations
## Not run: 
par(mfrow=c(1,2))
draw.history(o, "x")
draw.history(o, "z")

## End(Not run)
#try to increase the number of iterations and re-assess the 
#convergence diagnostics 

# An example using the Arima method:
## Not run: 
n<-50
idt <-1:n #the time index

mu<-50-idt +1.5*pmax(idt-30,0)
set.seed(6969)
y<-mu+arima.sim(list(ar=.5),n)*3.5

o<-arima(y, c(1,0,0), xreg=idt)
os1<-segmented(o, ~idt, control=seg.control(display=TRUE))

#note using the .coef argument is mandatory!
slope(os1, .coef=os1$coef)
plot(y)
plot(os1, add=TRUE, .coef=os1$coef, col=2)


## End(Not run)


######Three examples using the default method:

# 1. Cox regression with a segmented relationship  
## Not run: 
library(survival)
data(stanford2)

o<-coxph(Surv(time, status)~age, data=stanford2)
os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect
summary(os) #actually it means summary.coxph(os)
plot(os) #it does not work
plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise line


# 2. Linear mixed model via the nlme package   
dati$g<-gl(10,10) #the cluster 'id' variable
library(nlme)
o<-lme(y~x+z, random=~1|g, data=dati)
os<-segmented.default(o, ~x+z, npsi=list(x=2, z=1))

#summarizing results (note the '.coef' argument)
slope(os, .coef=fixef(os))
plot.segmented(os, "x", .coef=fixef(os), conf.level=.95)
confint.segmented(os, "x", .coef=fixef(os))


# 3. segmented quantile regression via the quantreg  package   
library(quantreg)
data(Mammals)
y<-with(Mammals, log(speed))
x<-with(Mammals, log(weight))
o<-rq(y~x, tau=.9)
os<-segmented.default(o, ~x) #it does NOT work. It cannot computed the vcov matrix..

#Let's define the vcov.rq function.. (I don't know if it is the best option..)
vcov.rq<-function(x,...) {
  V<-summary(x,cov=TRUE,se="nid",...)$cov
  rownames(V)<-colnames(V)<-names(x$coef)
V}

os<-segmented.default(o, ~x) #now it does work
 plot.segmented(os, res=TRUE, col=2, conf.level=.95)

## End(Not run)

segmented

Regression Models with Break-Points / Change-Points Estimation

v1.3-4

GPL

Authors

Vito M. R. Muggeo [aut, cre] (<https://orcid.org/0000-0002-3386-4054>)

Initial release

2021-04-28

segmented

Description

Usage

Arguments

Details

Value

Warning

Note

Author(s)

References

See Also

Examples

segmented

We don't support your browser anymore