Decision Tree Imputation
Imputation based on CART models or Random Forests.
impute_cart( dat, formula, add_residual = c("none", "observed", "normal"), cp, na_action = na.rpart, ... ) impute_rf( dat, formula, add_residual = c("none", "observed", "normal"), na_action = na.omit, ... )
dat |
|
formula |
|
add_residual |
|
cp |
The complexity parameter used to |
na_action |
|
... |
further arguments passed to
|
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed. Variables on the right-hand-side are used as predictors in the CART or random forest model.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
CART imputation by impute_cart
can be used for numerical,
categorical, or mixed data. Missing values are estimated using a
Classification and Regression Tree as specified by Breiman, Friedman and
Olshen (1984). This means that prediction is fairly robust agains missingess
in predictors.
Random Forest imputation with impute_rf
can be used for numerical,
categorical, or mixed data. Missing values are estimated using a Random Forest
model as specified by Breiman (2001).
Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A., 1984. Classification and regression trees. CRC press.
Breiman, L., 2001. Random forests. Machine learning, 45(1), pp.5-32.
Other imputation:
impute_hotdeck
,
impute_lm()
,
impute()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.