rsample: permutations – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

permutations

Permutation sampling

Description

A permutation sample is the same size as the original data set and is made by permuting/shuffling one or more columns. This results in analysis samples where some columns are in their original order and some columns are permuted to a random order. Unlike other sampling functions in rsample, there is no assessment set and calling assessment() on a permutation split will throw an error.

Usage

permutations(data, permute = NULL, times = 25, apparent = FALSE, ...)

Arguments

`data`	A data frame.
`permute`	One or more columns to shuffle. This argument supports `tidyselect` selectors. Multiple expressions can be combined with `c()`. Variable names can be used as if they were positions in the data frame, so expressions like `x:y` can be used to select a range of variables. See `language` for more details.
`times`	The number of permutation samples.
`apparent`	A logical. Should an extra resample be added where the analysis is the standard data set.
`...`	Not currently used.

Details

The argument apparent enables the option of an additional "resample" where the analysis data set is the same as the original data set. Permutation-based resampling can be especially helpful for computing a statistic under the null hypothesis (e.g. t-statistic). This forms the basis of a permutation test, which computes a test statistic under all possible permutations of the data.

Value

A tibble with classes permutations, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and a column called id that has a character string with the resample identifier.

Examples

permutations(mtcars, mpg, times = 2)
permutations(mtcars, mpg, times = 2, apparent = TRUE)

library(purrr)
resample1 <- permutations(mtcars, starts_with("c"), times = 1)
resample1$splits[[1]] %>% analysis()

resample2 <- permutations(mtcars, hp, times = 10, apparent = TRUE)
map_dbl(resample2$splits, function(x) {
  t.test(hp ~ vs, data = analysis(x))$statistic
})

rsample

General Resampling Infrastructure

v0.1.0

MIT + file LICENSE

Authors

Julia Silge [aut, cre] (<https://orcid.org/0000-0002-3671-836X>), Fanny Chow [aut], Max Kuhn [aut], Hadley Wickham [aut], RStudio [cph]

Initial release