Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

contr_one_hot

Contrast function for one-hot encodings


Description

This contrast function produces a model matrix with indicator columns for each level of each factor.

Usage

contr_one_hot(n, contrasts = TRUE, sparse = FALSE)

Arguments

n

A vector of character factor levels or the number of unique levels.

contrasts

This argument is for backwards compatibility and only the default of TRUE is supported.

sparse

This argument is for backwards compatibility and only the default of FALSE is supported.

Details

By default, model.matrix() generates binary indicator variables for factor predictors. When the formula does not remove an intercept, an incomplete set of indicators are created; no indicator is made for the first level of the factor.

For example, species and island both have three levels but model.matrix() creates two indicator variables for each:

library(dplyr)
library(modeldata)
data(penguins)

levels(penguins$species)
## [1] "Adelie"    "Chinstrap" "Gentoo"
levels(penguins$island)
## [1] "Biscoe"    "Dream"     "Torgersen"
model.matrix(~ species + island, data = penguins) %>% 
  colnames()
## [1] "(Intercept)"      "speciesChinstrap" "speciesGentoo"    "islandDream"     
## [5] "islandTorgersen"

For a formula with no intercept, the first factor is expanded to indicators for all factor levels but all other factors are expanded to all but one (as above):

model.matrix(~ 0 + species + island, data = penguins) %>% 
  colnames()
## [1] "speciesAdelie"    "speciesChinstrap" "speciesGentoo"    "islandDream"     
## [5] "islandTorgersen"

For inference, this hybrid encoding can be problematic.

To generate all indicators, use this contrast:

# Switch out the contrast method
old_contr <- options("contrasts")$contrasts
new_contr <- old_contr
new_contr["unordered"] <- "contr_one_hot"
options(contrasts = new_contr)

model.matrix(~ species + island, data = penguins) %>% 
  colnames()
## [1] "(Intercept)"      "speciesAdelie"    "speciesChinstrap" "speciesGentoo"   
## [5] "islandBiscoe"     "islandDream"      "islandTorgersen"
options(contrasts = old_contr)

Removing the intercept here does not affect the factor encodings.

Value

A diagonal matrix that is n-by-n.


parsnip

A Common API to Modeling and Analysis Functions

v0.1.5
GPL-2
Authors
Max Kuhn [aut, cre], Davis Vaughan [aut], RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.