Discretize Numeric Variables
discretize converts a numeric vector into a factor with
bins having approximately the same number of data points (based
on a training set).
discretize(x, ...) ## Default S3 method: discretize(x, ...) ## S3 method for class 'numeric' discretize( x, cuts = 4, labels = NULL, prefix = "bin", keep_na = TRUE, infs = TRUE, min_unique = 10, ... ) ## S3 method for class 'discretize' predict(object, new_data, ...)
x |
A numeric vector |
... |
Options to pass to
|
cuts |
An integer defining how many cuts to make of the data. |
labels |
A character vector defining the factor levels
that will be in the new factor (from smallest to largest). This
should have length |
prefix |
A single parameter value to be used as a prefix
for the factor levels (e.g. |
keep_na |
A logical for whether a factor level should be
created to identify missing values in |
infs |
A logical indicating whether the smallest and largest cut point should be infinite. |
min_unique |
An integer defining a sample size line of
dignity for the binning. If (the number of unique
values) |
object |
An object of class |
new_data |
A new numeric object to be binned. |
discretize estimates the cut points from
x using percentiles. For example, if cuts = 3, the
function estimates the quartiles of x and uses these as
the cut points. If cuts = 2, the bins are defined as
being above or below the median of x.
The predict method can then be used to turn numeric
vectors into factor vectors.
If keep_na = TRUE, a suffix of "_missing" is used as a
factor level (see the examples below).
If infs = FALSE and a new value is greater than the
largest value of x, a missing value will result.
discretize returns an object of class
discretize and predict.discretize returns a factor
vector.
library(modeldata)
data(biomass)
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
median(biomass_tr$carbon)
discretize(biomass_tr$carbon, cuts = 2)
discretize(biomass_tr$carbon, cuts = 2, infs = FALSE)
discretize(biomass_tr$carbon, cuts = 2, infs = FALSE, keep_na = FALSE)
discretize(biomass_tr$carbon, cuts = 2, prefix = "maybe a bad idea to bin")
carbon_binned <- discretize(biomass_tr$carbon)
table(predict(carbon_binned, biomass_tr$carbon))
carbon_no_infs <- discretize(biomass_tr$carbon, infs = FALSE)
predict(carbon_no_infs, c(50, 100))
rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
data = biomass_tr)
rec <- rec %>% step_discretize(carbon, hydrogen)
rec <- prep(rec, biomass_tr)
binned_te <- bake(rec, biomass_te)
table(binned_te$carbon)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.