parsnip: nearest_neighbor – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

nearest_neighbor

General Interface for K-Nearest Neighbor Models

Description

nearest_neighbor() is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R. The main arguments for the model are:

neighbors: The number of neighbors considered at each prediction.
weight_func: The type of kernel function that weights the distances between samples.
dist_power: The parameter used when calculating the Minkowski distance. This corresponds to the Manhattan distance with dist_power = 1 and the Euclidean distance with dist_power = 2.

These arguments are converted to their specific names at the time that the model is fit. Other options and arguments can be set using set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

Usage

nearest_neighbor(
  mode = "unknown",
  neighbors = NULL,
  weight_func = NULL,
  dist_power = NULL
)

Arguments

`mode`	A single character string for the type of model. Possible values for this model are `"unknown"`, `"regression"`, or `"classification"`.
`neighbors`	A single integer for the number of neighbors to consider (often called `k`). For kknn, a value of 5 is used if `neighbors` is not specified.
`weight_func`	A single character for the type of kernel function used to weight distances between samples. Valid choices are: `"rectangular"`, `"triangular"`, `"epanechnikov"`, `"biweight"`, `"triweight"`, `"cos"`, `"inv"`, `"gaussian"`, `"rank"`, or `"optimal"`.
`dist_power`	A single number for the parameter used in calculating Minkowski distance.

Details

The model can be created using the fit() function using the following engines:

R: "kknn" (the default)

Engine Details

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below:

kknn

nearest_neighbor() %>% 
  set_engine("kknn") %>% 
  set_mode("regression") %>% 
  translate()

## K-Nearest Neighbor Model Specification (regression)
## 
## Computational engine: kknn 
## 
## Model fit template:
## kknn::train.kknn(formula = missing_arg(), data = missing_arg(), 
##     ks = min_rows(5, data, 5))

nearest_neighbor() %>% 
  set_engine("kknn") %>% 
  set_mode("classification") %>% 
  translate()

## K-Nearest Neighbor Model Specification (classification)
## 
## Computational engine: kknn 
## 
## Model fit template:
## kknn::train.kknn(formula = missing_arg(), data = missing_arg(), 
##     ks = min_rows(5, data, 5))

For kknn, the underlying modeling function used is a restricted version of train.kknn() and not kknn(). It is set up in this way so that parsnip can utilize the underlying predict.train.kknn method to predict on new data. This also means that a single value of that function’s kernel argument (a.k.a weight_func here) can be supplied

For this engine, tuning over neighbors is very efficient since the same model object can be used to make predictions over multiple values of neighbors.

Parameter translations

The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.

parsnip	kknn
neighbors	ks
weight_func	kernel (optimal)
dist_power	distance (2)

Examples

show_engines("nearest_neighbor")

nearest_neighbor(neighbors = 11)

parsnip

A Common API to Modeling and Analysis Functions

v0.1.5

GPL-2

Authors

Max Kuhn [aut, cre], Davis Vaughan [aut], RStudio [cph]

Initial release