General Interface for K-Nearest Neighbor Models
nearest_neighbor()
is a way to generate a specification of a model
before fitting and allows the model to be created using
different packages in R. The main arguments for the
model are:
neighbors
: The number of neighbors considered at
each prediction.
weight_func
: The type of kernel function that weights the
distances between samples.
dist_power
: The parameter used when calculating the Minkowski
distance. This corresponds to the Manhattan distance with dist_power = 1
and the Euclidean distance with dist_power = 2
.
These arguments are converted to their specific names at the
time that the model is fit. Other options and arguments can be
set using set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
nearest_neighbor( mode = "unknown", neighbors = NULL, weight_func = NULL, dist_power = NULL )
mode |
A single character string for the type of model.
Possible values for this model are |
neighbors |
A single integer for the number of neighbors
to consider (often called |
weight_func |
A single character for the type of kernel function used
to weight distances between samples. Valid choices are: |
dist_power |
A single number for the parameter used in calculating Minkowski distance. |
The model can be created using the fit()
function using the
following engines:
R: "kknn"
(the default)
Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below:
nearest_neighbor() %>% set_engine("kknn") %>% set_mode("regression") %>% translate()
## K-Nearest Neighbor Model Specification (regression) ## ## Computational engine: kknn ## ## Model fit template: ## kknn::train.kknn(formula = missing_arg(), data = missing_arg(), ## ks = min_rows(5, data, 5))
nearest_neighbor() %>% set_engine("kknn") %>% set_mode("classification") %>% translate()
## K-Nearest Neighbor Model Specification (classification) ## ## Computational engine: kknn ## ## Model fit template: ## kknn::train.kknn(formula = missing_arg(), data = missing_arg(), ## ks = min_rows(5, data, 5))
For kknn
, the underlying modeling function used is a restricted
version of train.kknn()
and not kknn()
. It is set up in this way so
that parsnip
can utilize the underlying predict.train.kknn
method to
predict on new data. This also means that a single value of that
function’s kernel
argument (a.k.a weight_func
here) can be supplied
For this engine, tuning over neighbors
is very efficient since the
same model object can be used to make predictions over multiple values
of neighbors
.
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.
parsnip | kknn |
neighbors | ks |
weight_func | kernel (optimal) |
dist_power | distance (2) |
show_engines("nearest_neighbor") nearest_neighbor(neighbors = 11)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.