Create Missing Data Column Indicators
step_indicate_na
creates a specification of a recipe step that will
create and append additional binary columns to the dataset to indicate
which observations are missing.
step_indicate_na( recipe, ..., role = "predictor", trained = FALSE, columns = NULL, prefix = "na_ind", skip = FALSE, id = rand_id("indicate_na") ) ## S3 method for class 'step_indicate_na' tidy(x, ...)
recipe |
A recipe object. The check will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which variables are
affected by the step. See |
role |
For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new na indicator columns created from the original variables will be used as predictors in a model. |
trained |
A logical for whether the selectors in |
columns |
A character string of variable names that will be populated (eventually) by the terms argument. |
prefix |
A character string that will be the prefix to the resulting new variables. Defaults to "na_ind". |
skip |
A logical. Should the check be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe
with the new step added to the
sequence of existing steps (if any). For the tidy
method, a tibble with
columns terms
(the selectors or variables selected) and model
(the
median value).
library(modeldata) data("credit_data") ## missing data per column purrr::map_dbl(credit_data, function(x) mean(is.na(x))) set.seed(342) in_training <- sample(1:nrow(credit_data), 2000) credit_tr <- credit_data[ in_training, ] credit_te <- credit_data[-in_training, ] rec <- recipe(Price ~ ., data = credit_tr) impute_rec <- rec %>% step_indicate_na(Income, Assets, Debt) imp_models <- prep(impute_rec, training = credit_tr) imputed_te <- bake(imp_models, new_data = credit_te, everything())
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.