Bootstrap Sampling
A bootstrap sample is a sample that is the same size as the original data set that is made using replacement. This results in analysis samples that have multiple replicates of some of the original rows of the data. The assessment set is defined as the rows of the original data that were not included in the bootstrap sample. This is often referred to as the "out-of-bag" (OOB) sample.
bootstraps( data, times = 25, strata = NULL, breaks = 4, pool = 0.1, apparent = FALSE, ... )
data |
A data frame. |
times |
The number of bootstrap samples. |
strata |
A variable that is used to conduct stratified sampling. When
not |
breaks |
A single number giving the number of bins desired to stratify a numeric stratification variable. |
pool |
A proportion of data used to determine if a particular group is too small and should be pooled into another group. We do not recommend decreasing this argument below its default of 0.1 because of the dangers of stratifying groups that are too small. |
apparent |
A logical. Should an extra resample be added where the
analysis and holdout subset are the entire data set. This is required for
some estimators used by the |
... |
Not currently used. |
The argument apparent
enables the option of an additional
"resample" where the analysis and assessment data sets are the same as the
original data set. This can be required for some types of analysis of the
bootstrap results.
The strata
argument is based on a similar argument in the random forest
package were the bootstrap samples are conducted within the stratification
variable. This can help ensure that the number of data points in the
bootstrap sample is equivalent to the proportions in the original data set.
(Strata below 10% of the total are pooled together by default.)
An tibble with classes bootstraps
, rset
, tbl_df
, tbl
, and
data.frame
. The results include a column for the data split objects and a
column called id
that has a character string with the resample identifier.
bootstraps(mtcars, times = 2) bootstraps(mtcars, times = 2, apparent = TRUE) library(purrr) library(modeldata) data(wa_churn) set.seed(13) resample1 <- bootstraps(wa_churn, times = 3) map_dbl(resample1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == "Yes") }) set.seed(13) resample2 <- bootstraps(wa_churn, strata = churn, times = 3) map_dbl(resample2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == "Yes") }) set.seed(13) resample3 <- bootstraps(wa_churn, strata = tenure, breaks = 6, times = 3) map_dbl(resample3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == "Yes") })
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.