Polymars: multivariate adaptive polynomial spline regression
An adaptive regression procedure using piecewise linear splines to model the response.
polymars(responses, predictors, maxsize, gcv = 4, additive = FALSE, startmodel, weights, no.interact, knots, knot.space = 3, ts.resp, ts.pred, ts.weights, classify, factors, tolerance, verbose = FALSE)
responses |
vector of responses, or a matrix for multiple response regression. In the case of a matrix each column corresponds to a response and each row corresponds to an observation. Missing values are not allowed. |
predictors |
matrix of predictor variables for the regression. Each column corresponds to a predictor and each row corresponds to an observation in the same order as they appear in the response argument. Missing values are not allowed. |
maxsize |
the maximum number of basis functions that the model is allowed to grow to in
the stepwise addition procedure. Default is
\min(6*(n^{1/3}),n/4,100), where |
gcv |
parameter used to find the overall best model from a sequence of fitted models.
The residual sum of squares of a model is penalized by dividing by the square of
|
additive |
Should the fitted model be additive in the predictors? |
startmodel |
the first model that is to be fit by |
weights |
optional vector of observation weights; if supplied, the algorithm fits to minimize the sum of the weights multiplied by the squared residuals. The length of weights must be the same as the number of observations. The weights must be nonnegative. |
no.interact |
an optional matrix used if certain predictor interactions are not allowed in the model.
It is given as a matrix of size |
knots |
defines how the function is to find potential knots for the spline basis
functions. This can be set to the maximum number of knots you would
like to be considered for each predictor.
Usually, to avoid the design matrix becoming singular the actual number of
knots produced is constrained to at most every third order statistic in any
predictor. This constraint can be adjusted using the When specifying knots as a single number or a matrix and there are categorical variables these are specified separately as such using the factor argument. |
knot.space |
is an integer describing the minimum number of order statistics apart that two knots can be. Knots should not be too close to insure numerical stability. |
ts.resp |
testset responses for model selection. Should have the same number of columns
as the training set response. A testset can be used for the model selection.
Depending on the value of classify, either the model with the smallest testset
residual sum of squares or the smallest testset classification error is
provided. Overrides |
ts.pred |
testset predictors. Should have the same number of columns as the training set predictors. |
ts.weights |
testset observation weights. A vector of length equal to the number of cases of the testset. All weights must be non-negative. |
classify |
when the response is discrete (categorical), polymars can be used for
classification. In particular, when |
factors |
used to indicate that certain variables in the predictor set are categorical
variables. Specified as a vector containing the appropriate predictor
indices (column numbers of categorical variables in predictors matrix). Factors
can also be set when the |
tolerance |
for each possible candidate to be added/deleted the resulting residual sums
of squares of the model, with/without this candidate, must be calculated.
The inversion of of the "X-transpose by X" matrix, X being the design matrix,
is done by an updating procedure c.f. C.R. Rao - Linear Statistical Inference
and Its Applications, 2nd. edition, page 33.
In the inversion the size of the bottom right-hand entry of this matrix is
critical. If it |
verbose |
when set to |
An object of the class polymars. The returned object contains information about the fitting steps and the model selected. The first data frame contains a row for each step of the fitting procedure. In the columns are: a 1 for an addition step or a 0 for a deletion step, the size of the model at each step, residual sums of squares (RSS) and the generalized cross validation value (GCV), testset residual sums of squares or testset misclassification, whatever was used for the model selection. The second data frame, model, contains a row for each basis function of the model. Each row corresponds to one basis function (with two possible components). The pred1 column contains the indices of the first predictor of the basis function. Column knot1 is a possible knot in this predictor. If this column is NA, the first component is linear. If any of the basis functions of the model is categorical then there will be a level1 column. Column pred2 is the possible second predictor involved (if it is NA the basis function only depends on one predictor). Column knot2 contains the possible knot for the predictor pred2, and it is NA when this component is linear. This is a similar format to the startmodel argument together with an additional first row corresponding to the intercept but the startmodel doesn't use a separate column to specify levels of a categorical variable . If any predictor in pred2 is categorical then there will be a level2 column. The column "coefs" (more than one column in the case of multiple response regression) contains the coefficients. The returned object also contains the fitted values and residuals of the data used in fitting the model.
The algorithm employed by polymars
is different from the MARS(tm)
algorithm of Friedman (1991), though it has many similarities. (The name
polymars
has been used for this algorithm well before MARS was trademarked.)
Some of the main differences are:
polymars
requires linear terms of a predictor to be in the model
before nonlinear terms using the same predictor can be added;
polymars
requires a univariate basis function to be in the model
before a tensor-product basis function involving the univariate
basis function can be in the model;
during stepwise deletion the same hierarchy is maintained;
polymars
can be fit to multiple outcomes simultaneously, with
categorical outcomes it can be used for multiple classification; and
polyclass
uses the same modeling strategy as polymars
,
but uses a logistic (polychotomous) likelihood.
MARS is a registered trademark of Jeril, Inc and is used here with permission. Commercial licenses and versions of PolyMARS may be obtained from Salford Systems at http://www.salford-systems.com
Martin O'Connor.
Charles Kooperberg, Smarajit Bose, and Charles J. Stone (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117–127.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). The Annals of Statistics, 19, 1–141.
Charles J. Stone, Mark Hansen, Charles Kooperberg, and Young K. Truong. The use of polynomial splines and their tensor products in extended linear modeling (with discussion) (1997). Annals of Statistics, 25, 1371–1470.
data(state) state.pm <- polymars(state.region, state.x77, knots = 15, classify = TRUE) state.pm2 <- polymars(state.x77[, 2], state.x77[,-2], gcv = 2) plot(fitted(state.pm2), residuals(state.pm2))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.