Impute by variable derivation
Impute missing values by a constant, by copying another variable computing transformations from other variables.
impute_proxy(dat, formula, add_residual = c("none", "observed", "normal"), ...) impute_const(dat, formula, add_residual = c("none", "observed", "normal"), ...)
dat |
|
formula |
|
add_residual |
|
... |
Currently unused |
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed.
For impute_const
, the MODEL_SPECIFICATION
is a single
value and GROUPING_VARIABLES
are ignored.
For impute_proxy
, the MODEL_SPECIFICATION
is a variable or
expression in terms of variables in the dataset that must result in either a
single number of in a vector of length nrow(dat)
.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
irisNA <- iris irisNA[1:3,1] <- irisNA[3:7,2] <- NA # impute a constant a <- impute_const(irisNA, Sepal.Width ~ 7) head(a) a <- impute_proxy(irisNA, Sepal.Width ~ 7) head(a) # copy a value from another variable (where available) a <- impute_proxy(irisNA, Sepal.Width ~ Sepal.Length) head(a) # group mean imputation a <- impute_proxy(irisNA , Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE) | Species) head(a) # random hot deck imputation a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length, na.rm=TRUE) , add_residual = "observed") # ratio imputation (but use impute_lm for that) a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE)/mean(Sepal.Width,na.rm=TRUE) * Sepal.Width)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.