Sample rows using dplyr
step_sample creates a specification of a recipe step
that will sample rows using dplyr::sample_n() or
dplyr::sample_frac().
step_sample(
recipe,
...,
role = NA,
trained = FALSE,
size = NULL,
replace = FALSE,
skip = TRUE,
id = rand_id("sample")
)
## S3 method for class 'step_sample'
tidy(x, ...)recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
Argument ignored; included for consistency with other step
specification functions. For the |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
size |
An integer or fraction. If the value is within (0, 1),
|
replace |
Sample with or without replacement? |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
x |
A |
An updated version of recipe with the new step
added to the sequence of existing steps (if any). For the
tidy method, a tibble with columns size, replace,
and id.
This step can entirely remove observations (rows of data), which can have
unintended and/or problematic consequences when applying the step to new
data later via bake.recipe(). Consider whether skip = TRUE or
skip = FALSE is more appropriate in any given use case. In most instances
that affect the rows of the data being predicted, this step probably should
not be applied at all; instead, execute operations like this outside and
before starting a preprocessing recipe().
# Uses `sample_n` recipe( ~ ., data = mtcars) %>% step_sample(size = 1) %>% prep(training = mtcars) %>% bake(new_data = NULL) %>% nrow() # Uses `sample_frac` recipe( ~ ., data = mtcars) %>% step_sample(size = 0.9999) %>% prep(training = mtcars) %>% bake(new_data = NULL) %>% nrow() # Uses `sample_n` and returns _at maximum_ 20 samples. smaller_cars <- recipe( ~ ., data = mtcars) %>% step_sample() %>% prep(training = mtcars %>% slice(1:20)) bake(smaller_cars, new_data = NULL) %>% nrow() bake(smaller_cars, new_data = mtcars %>% slice(21:32)) %>% nrow()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.