Indirect Bagging
Function to perform the indirect bagging and subagging.
## S3 method for class 'data.frame' inbagg(formula, data, pFUN=NULL, cFUN=list(model = NULL, predict = NULL, training.set = NULL), nbagg = 25, ns = 0.5, replace = FALSE, ...)
formula |
formula. A |
data |
data frame of explanatory, intermediate and response variables. |
pFUN |
list of lists, which describe models for the intermediate variables, details are given below. |
cFUN |
either a fixed function with argument |
nbagg |
number of bootstrap samples. |
ns |
proportion of sample to be drawn from the learning sample. By default, subagging with 50% is performed, i.e. draw 0.5*n out of n without replacement. |
replace |
logical. Draw with or without replacement. |
... |
additional arguments (e.g. |
A given data set is subdivided into three types of variables: explanatory, intermediate and response variables.
Here, each specified intermediate variable is modelled separately
following pFUN
, a list of lists with elements specifying an
arbitrary number of models for the intermediate variables and an
optional element training.set = c("oob", "bag", "all")
. The
element training.set
determines whether, predictive models for
the intermediate are calculated based on the out-of-bag sample
("oob"
), the default, on the bag sample ("bag"
) or on all
available observations ("all"
). The elements of pFUN
,
specifying the models for the intermediate variables are lists as
described in inclass
.
Note that, if no formula is given in these elements, the functional
relationship of formula
is used.
The response variable is modelled following cFUN
.
This can either be a fixed classifying function as described in Peters
et al. (2003) or a list,
which specifies the modelling technique to be applied. The list
contains the arguments model
(which model to be fitted),
predict
(optional, how to predict), formula
(optional, of
type y~w1+w2+w3+x1+x2
determines the variables the classifying
function is based on) and the optional argument training.set =
c("fitted.bag", "original", "fitted.subset")
specifying whether the classifying function is trained on the predicted
observations of the bag sample ("fitted.bag"
),
on the original observations ("original"
) or on the
predicted observations not included in a defined subset
("fitted.subset"
). Per default the formula specified in
formula
determines the variables, the classifying function is
based on.
Note that the default of cFUN = list(model = NULL, training.set = "fitted.bag")
uses the function rpart
and
the predict function predict(object, newdata, type = "class")
.
An object of class "inbagg"
, that is a list with elements
mtrees |
a list of length |
y |
vector of response values. |
W |
data frame of intermediate variables. |
X |
data frame of explanatory variables. |
David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised classification with structured class definitions. Computational Statistics & Data Analysis 36, 209–225.
Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003), Diagnosis of glaucoma by indirect classifiers. Methods of Information in Medicine 1, 99-103.
library("MASS") library("rpart") y <- as.factor(sample(1:2, 100, replace = TRUE)) W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3)) X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3)) colnames(W) <- c("w1", "w2", "w3") colnames(X) <- c("x1", "x2", "x3") DATA <- data.frame(y, W, X) pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm), list(model = rpart)) inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.