Default XY blueprint
This pages holds the details for the XY preprocessing blueprint. This
is the blueprint used by default from mold()
if x
and y
are provided
separately (i.e. the XY interface is used).
default_xy_blueprint( intercept = FALSE, allow_novel_levels = FALSE, composition = "tibble" ) ## S3 method for class 'data.frame' mold(x, y, ..., blueprint = NULL) ## S3 method for class 'matrix' mold(x, y, ..., blueprint = NULL)
intercept |
A logical. Should an intercept be included in the
processed data? This information is used by the |
allow_novel_levels |
A logical. Should novel factor levels be allowed at
prediction time? This information is used by the |
composition |
Either "tibble", "matrix", or "dgCMatrix" for the format of the processed predictors. If "matrix" or "dgCMatrix" are chosen, all of the predictors must be numeric after the preprocessing method has been applied; otherwise an error is thrown. |
x |
A data frame or matrix containing the predictors. |
y |
A data frame, matrix, or vector containing the outcomes. |
... |
Not used. |
blueprint |
A preprocessing |
As documented in standardize()
, if y
is a vector, then the returned
outcomes tibble has 1 column with a standardized name of ".outcome"
.
The one special thing about the XY method's forge function is the behavior of
outcomes = TRUE
when a vector y
value was provided to the original
call to mold()
. In that case, mold()
converts y
into a tibble, with
a default name of .outcome
. This is the column that forge()
will look
for in new_data
to preprocess. See the examples section for a
demonstration of this.
For default_xy_blueprint()
, an XY blueprint.
When mold()
is used with the default xy blueprint:
It converts x
to a tibble.
It adds an intercept column to x
if intercept = TRUE
.
It runs standardize()
on y
.
When forge()
is used with the default xy blueprint:
# --------------------------------------------------------------------------- # Setup train <- iris[1:100,] test <- iris[101:150,] train_x <- train[, "Sepal.Length", drop = FALSE] train_y <- train[, "Species", drop = FALSE] test_x <- test[, "Sepal.Length", drop = FALSE] test_y <- test[, "Species", drop = FALSE] # --------------------------------------------------------------------------- # XY Example # First, call mold() with the training data processed <- mold(train_x, train_y) # Then, call forge() with the blueprint and the test data # to have it preprocess the test data in the same way forge(test_x, processed$blueprint) # --------------------------------------------------------------------------- # Intercept processed <- mold(train_x, train_y, blueprint = default_xy_blueprint(intercept = TRUE)) forge(test_x, processed$blueprint) # --------------------------------------------------------------------------- # XY Method and forge(outcomes = TRUE) # You can request that the new outcome columns are preprocessed as well, but # they have to be present in `new_data`! processed <- mold(train_x, train_y) # Can't do this! try(forge(test_x, processed$blueprint, outcomes = TRUE)) # Need to use the full test set, including `y` forge(test, processed$blueprint, outcomes = TRUE) # With the XY method, if the Y value used in `mold()` is a vector, # then a column name of `.outcome` is automatically generated. # This name is what forge() looks for in `new_data`. # Y is a vector! y_vec <- train_y$Species processed_vec <- mold(train_x, y_vec) # This throws an informative error that tell you # to include an `".outcome"` column in `new_data`. try(forge(iris, processed_vec$blueprint, outcomes = TRUE)) test2 <- test test2$.outcome <- test2$Species test2$Species <- NULL # This works, and returns a tibble in the $outcomes slot forge(test2, processed_vec$blueprint, outcomes = TRUE) # --------------------------------------------------------------------------- # Matrix output for predictors # You can change the `composition` of the predictor data set bp <- default_xy_blueprint(composition = "dgCMatrix") processed <- mold(train_x, train_y, blueprint = bp) class(processed$predictors)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.