Multivariate Adaptive Regression Splines
Build a regression model using the techniques in Friedman's papers "Multivariate Adaptive Regression Splines" and "Fast MARS".
See the package vignette “Notes on the earth package”.
## S3 method for class 'formula' earth(formula = stop("no 'formula' argument"), data = NULL, weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## Default S3 method: earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, nfold=0, ncross=1, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = NULL, ...) ## S3 method for class 'fit' earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, offset = NULL, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, penalty = if(degree > 1) 3 else 2, nk = min(200, max(20, 2 * ncol(x))) + 1, thresh = 0.001, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 20, fast.beta = 1, linpreds = FALSE, allowed = NULL, nprune = NULL, Object = NULL, Scale.y = NULL, Adjust.endspan = 2, Auto.linpreds = TRUE, Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE, Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e-10, ...)
To start off, look at the arguments
formula
,
data
,
x
,
y
,
nk
,
degree
, and
trace
.
If the response is binary or a factor, consider using the glm
argument.
For cross validation, use the nfold
argument.
For prediction intervals, use the varmod.method
argument.
Most users will find that the above arguments are all they need,
plus in some cases keepxy
and nprune
.
Unless you are a knowledgeable user, it's best not subvert the
standard algorithm by toying with tuning parameters such as thresh
,
penalty
, and endspan
.
formula |
Model formula. |
data |
Data frame for |
x |
Matrix or dataframe containing the independent variables. |
y |
Vector containing the response variable, or, in the case of multiple responses, a matrix or dataframe whose columns are the values for each response. |
subset |
Index vector specifying which cases to use, i.e., which rows in |
weights |
Case weights.
Default is NULL, meaning no case weights.
If specified, |
wp |
Response weights.
Default is NULL, meaning no response weights.
If specified, |
na.action |
NA action. Default is |
offset |
Offset term passed from the formula in |
keepxy |
Default is |
trace |
Trace |
glm |
NULL (default) or a list of arguments to pass on to |
degree |
Maximum degree of interaction (Friedman's mi).
Default is |
penalty |
Generalized Cross Validation (GCV) penalty per knot.
Default is |
nk |
Maximum number of model terms before pruning, i.e., the
maximum number of terms created by the forward pass.
Includes the intercept. |
thresh |
Forward stepping threshold.
Default is |
minspan |
Minimum number of observations between knots.
(This increases resistance to runs of correlated noise in the input data.) |
endspan |
Minimum number of observations before the first and after the final knot. |
newvar.penalty |
Penalty for adding a new variable in the forward pass
(Friedman's gamma, equation 74 in the MARS paper).
Default is |
fast.k |
Maximum number of parent terms considered at each step of the forward pass.
(This speeds up the forward pass. See the Fast MARS paper section 3.0.) |
fast.beta |
Fast MARS ageing coefficient, as described in the
Fast MARS paper section 3.1.
Default is |
linpreds |
Index vector specifying which predictors should enter linearly, as in |
allowed |
Function specifying which predictors can interact and how.
Default is NULL, meaning all standard MARS terms are allowed. |
pmethod |
Pruning method.
One of: |
nprune |
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than |
nfold |
Number of cross-validation folds.
Default is |
ncross |
Only applies if |
stratify |
Only applies if |
varmod.method |
Construct a variance model.
For details, see |
varmod.exponent |
Power transform applied to the rhs before regressing the
absolute residuals with the specified |
varmod.conv |
Convergence criterion for the Iteratively Reweighted Least Squares used
when creating the variance model. |
varmod.clamp |
The estimated standard deviation of the main model errors
is forced to be at least a small positive value,
which we call |
varmod.minspan |
Only applies when |
Object |
Earth object to be updated, for use by |
Scale.y |
|
Adjust.endspan |
In interaction terms, |
Auto.linpreds |
Default is |
Force.weights |
Default is |
Use.beta.cache |
Default is |
Force.xtx.prune |
Default is |
Get.leverages |
Default is |
Exhaustive.tol |
Default |
... |
Dots are passed on to |
An S3 model of class "earth"
.
See earth.object
for a complete description.
Stephen Milborrow, derived from mda::mars
by Trevor Hastie and Robert Tibshirani.
The approach used for GLMs was motivated by work done by Jane Elith and John Leathwick (a representative paper is given below).
The evimp
function uses ideas from Max Kuhn's caret
package
https://CRAN.R-project.org/package=caret.
Parts of Thomas Lumley's leaps
package have been
incorporated into earth
, so earth
can directly access
Alan Miller's Fortran functions without going through hidden functions
in the leaps
package.
The Wikipedia article is recommended for an elementary introduction.
The primary references are the Friedman papers, but
readers may find the MARS section in Hastie, Tibshirani,
and Friedman a more accessible introduction.
Faraway takes a hands-on approach,
using the ozone
data to compare mda::mars
with other techniques.
(If you use Faraway's examples with earth
instead of mars
, use $bx
instead of $x
, and check out the book's errata.)
Friedman and Silverman is recommended background reading for the MARS paper.
Earth's pruning pass uses code from the leaps
package
which is based on techniques in Miller.
Faraway (2005) Extending the Linear Model with R http://www.maths.bath.ac.uk/~jjf23
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion)
Annals of Statistics 19/1, 1–141
http://projecteuclid.org/euclid.aos/1176347963
doi: 10.1214/aos/1176347963
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
https://statistics.stanford.edu/research/fast-mars
Friedman and Silverman (1989) Flexible Parsimonious Smoothing and Additive Modeling Technometrics, Vol. 31, No. 1. https://www.tandfonline.com/doi/abs/10.1080/00401706.1989.10488470
Hastie, Tibshirani, and Friedman (2009) The Elements of Statistical Learning (2nd ed.) http://web.stanford.edu/~hastie/pub.htm
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand's freshwater diadromous fish Freshwater Biology, 50, 2034-2052 http://web.stanford.edu/~hastie/pub.htm, http://www.botany.unimelb.edu.au/envisci/about/staff/elith.html
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression https://wp.csiro.au/alanmiller/index.html
Wikipedia article on MARS https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
Start with summary.earth
, plot.earth
,
evimp
, and plotmo
.
Please see the main package vignette “Notes on the earth package”. The vignette can also be downloaded from http://www.milbo.org/doc/earth-notes.pdf.
The vignette
“Variance models in earth”
is also included with the package.
It describes how to generate prediction intervals for earth
models.
earth.mod <- earth(Volume ~ ., data = trees) plotmo(earth.mod) summary(earth.mod, digits = 2, style = "pmax")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.