Multivariate, model-based imputation
Models that simultaneously optimize imptuation of multiple variables. Methods include imputation based on EM-estimation of multivariate normal parameters, imputation based on iterative Random Forest estimates and stochastic imptuation based on bootstrapped EM-estimatin of multivariate normal parameters.
impute_em(dat, formula, verbose = 0, ...) impute_mf(dat, formula, ...)
dat |
|
formula |
|
verbose |
|
... |
Options passed to
|
Formulas are of the form
[IMPUTED_VARIABLES] ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
When IMPUTED_VARIABLES
is empty, every variable in
MODEL_SPECIFICATION
will be imputed. When IMPUTED_VARIABLES
is
specified, all variables in IMPUTED_VARIABLES
and
MODEL_SPECIFICATION
are part of the model, but only the
IMPUTED_VARIABLES
are imputed in the output.
GROUPING_VARIABLES
specify what categorical variables are used to
split-impute-combine the data. Grouping using dplyr::group_by
is also
supported. If groups are defined in both the formula and using
dplyr::group_by
, the data is grouped by the union of grouping
variables. Any missing value in one of the grouping variables results in an
error.
EM-based imputation with impute_em
only works for numerical
variables. These variables are assumed to follow a multivariate normal distribution
for which the means and covariance matrix is estimated based on the EM-algorithm
of Dempster Laird and Rubin (1977). The imputations are the expected values
for missing values, conditional on the value of the estimated parameters.
Multivariate Random Forest imputation with impute_mf
works for
numerical, categorical or mixed data types. It is based on the algorithm
of Stekhoven and Buehlman (2012). Missing values are imputed using a
rough guess after which a predictive random forest is trained and used
to re-impute themissing values. This is iterated until convergence.
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the royal statistical society. Series B (methodological) (1977): 1-38.
Stekhoven, D.J. and Buehlmann, P., 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), pp.112-118.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.