Impute Numeric Data Using the Median
step_impute_median
creates a specification of a recipe step that will
substitute missing values of numeric variables by the training set median of
those variables.
step_impute_median( recipe, ..., role = NA, trained = FALSE, medians = NULL, skip = FALSE, id = rand_id("impute_median") ) step_medianimpute( recipe, ..., role = NA, trained = FALSE, medians = NULL, skip = FALSE, id = rand_id("impute_median") ) ## S3 method for class 'step_impute_median' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which variables are
affected by the step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
medians |
A named numeric vector of medians. This is |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
step_impute_median
estimates the variable medians from the data
used in the training
argument of prep.recipe
. bake.recipe
then applies
the new values to new data sets using these medians.
As of recipes
0.1.16, this function name changed from
step_medianimpute()
to step_impute_median()
.
An updated version of recipe
with the new step added to the
sequence of existing steps (if any). For the tidy
method, a tibble with
columns terms
(the selectors or variables selected) and model
(the
median value).
library(modeldata) data("credit_data") ## missing data per column vapply(credit_data, function(x) mean(is.na(x)), c(num = 0)) set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] missing_examples <- c(14, 394, 565) rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_impute_median(Income, Assets, Debt) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, new_data = credit_te, everything()) credit_te[missing_examples,] imputed_te[missing_examples, names(credit_te)] tidy(impute_rec, number = 1) tidy(imp_models, number = 1)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.