Create a new preprocessing blueprint
These are the base classes for creating new preprocessing blueprints. All
blueprints inherit from the one created by new_blueprint()
, and the default
method specific blueprints inherit from the other three here.
If you want to create your own processing blueprint for a specific method,
generally you will subclass one of the method specific blueprints here. If
you want to create a completely new preprocessing blueprint for a totally new
preprocessing method (i.e. not the formula, xy, or recipe method) then
you should subclass new_blueprint()
.
new_formula_blueprint( mold, forge, intercept = FALSE, allow_novel_levels = FALSE, ptypes = NULL, formula = NULL, indicators = "traditional", composition = "tibble", ..., subclass = character() ) new_recipe_blueprint( mold, forge, intercept = FALSE, allow_novel_levels = FALSE, fresh = TRUE, composition = "tibble", ptypes = NULL, recipe = NULL, ..., subclass = character() ) new_xy_blueprint( mold, forge, intercept = FALSE, allow_novel_levels = FALSE, composition = "tibble", ptypes = NULL, ..., subclass = character() ) new_blueprint( mold, forge, intercept = FALSE, allow_novel_levels = FALSE, composition = "tibble", ptypes = NULL, ..., subclass = character() )
mold |
A named list with two elements, |
forge |
A named list with two elements, |
intercept |
A logical. Should an intercept be included in the
processed data? This information is used by the |
allow_novel_levels |
A logical. Should novel factor levels be allowed at
prediction time? This information is used by the |
ptypes |
Either |
formula |
Either |
indicators |
A single character string. Control how factors are expanded into dummy variable indicator columns. One of:
|
composition |
Either "tibble", "matrix", or "dgCMatrix" for the format of the processed predictors. If "matrix" or "dgCMatrix" are chosen, all of the predictors must be numeric after the preprocessing method has been applied; otherwise an error is thrown. |
... |
Name-value pairs for additional elements of blueprints that subclass this blueprint. |
subclass |
A character vector. The subclasses of this blueprint. |
fresh |
Should already trained operations be re-trained when |
recipe |
Either |
A preprocessing blueprint, which is a list containing the inputs used as arguments to the function, along with a class specific to the type of blueprint being created.
blueprint$mold
should be a named list with two elements, both of which
are functions:
clean
: A function that performs initial cleaning of the user's input
data to be used in the model.
Arguments:
If this is an xy blueprint, blueprint
, x
and y
.
Otherwise, blueprint
and data
.
Output: A named list of three elements:
blueprint
: The blueprint, returned and potentially updated.
If using an xy blueprint:
x
: The cleaned predictor data.
y
: The cleaned outcome data.
If not using an xy blueprint:
data
: The cleaned data.
process
: A function that performs the actual preprocessing of the data.
Arguments:
If this is an xy blueprint, blueprint
, x
and y
.
Otherwise, blueprint
and data
.
Output: A named list of 5 elements:
blueprint
: The blueprint, returned and potentially updated.
predictors
: A tibble of predictors.
outcomes
: A tibble of outcomes.
ptypes
: A named list with 2 elements, predictors
and outcomes
,
where both elements are 0-row tibbles.
extras
: Varies based on the blueprint. If the blueprint has no
extra information, NULL
. Otherwise a named list of the
extra elements returned by the blueprint.
Both blueprint$mold$clean()
and blueprint$mold$process()
will be called,
in order, from mold()
.
blueprint$forge
should be a named list with two elements, both of which
are functions:
clean
: A function that performs initial cleaning of new_data
:
Arguments:
blueprint
, new_data
, and outcomes
.
Output: A named list of the following elements:
blueprint
: The blueprint, returned and potentially updated.
predictors
: A tibble containing the cleaned predictors.
outcomes
: A tibble containing the cleaned outcomes.
extras
: A named list of any extras obtained while cleaning. These
are passed on to the process()
function for further use.
process
: A function that performs the actual preprocessing of the data
using the known information in the blueprint
.
Arguments:
blueprint
, new_data
, outcomes
, extras
.
Output: A named list of the following elements:
blueprint
: The blueprint, returned and potentially updated.
predictors
: A tibble of the predictors.
outcomes
: A tibble of the outcomes, or NULL
.
extras
: Varies based on the blueprint. If the blueprint has no
extra information, NULL
. Otherwise a named list of the
extra elements returned by the blueprint.
Both blueprint$forge$clean()
and blueprint$forge$process()
will be called,
in order, from forge()
.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.