Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

int_pctl

Bootstrap confidence intervals


Description

Calculate bootstrap confidence intervals using various methods.

Usage

int_pctl(.data, statistics, alpha = 0.05)

int_t(.data, statistics, alpha = 0.05)

int_bca(.data, statistics, alpha = 0.05, .fn, ...)

Arguments

.data

A data frame containing the bootstrap resamples created using bootstraps(). For t- and BCa-intervals, the apparent argument should be set to TRUE. Even if the apparent argument is set to TRUE for the percentile method, the apparent data is never used in calculating the percentile confidence interval.

statistics

An unquoted column name or dplyr selector that identifies a single column in the data set that contains the individual bootstrap estimates. This can be a list column of tidy tibbles (that contains columns term and estimate) or a simple numeric column. For t-intervals, a standard tidy column (usually called std.err) is required. See the examples below.

alpha

Level of significance

.fn

A function to calculate statistic of interest. The function should take an rsplit as the first argument and the ... are required.

...

Arguments to pass to .fn.

Details

Percentile intervals are the standard method of obtaining confidence intervals but require thousands of resamples to be accurate. T-intervals may need fewer resamples but require a corresponding variance estimate. Bias-corrected and accelerated intervals require the original function that was used to create the statistics of interest and are computationally taxing.

Value

Each function returns a tibble with columns .lower, .estimate, .upper, .alpha, .method, and term. .method is the type of interval (eg. "percentile", "student-t", or "BCa"). term is the name of the estimate. Note the .estimate returned from int_pctl() is the mean of the estimates from the bootstrap resamples and not the estimate from the apparent model.

References

Davison, A., & Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843

See Also

Examples

library(broom)
library(dplyr)
library(purrr)
library(tibble)

lm_est <- function(split, ...) {
  lm(mpg ~ disp + hp, data = analysis(split)) %>%
    tidy()
}

set.seed(52156)
car_rs <-
  bootstraps(mtcars, 500, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(car_rs, results)
int_t(car_rs, results)
int_bca(car_rs, results, .fn = lm_est)

# putting results into a tidy format
rank_corr <- function(split) {
  dat <- analysis(split)
  tibble(
    term = "corr",
    estimate = cor(dat$sqft, dat$price, method = "spearman"),
    # don't know the analytical std.err so no t-intervals
    std.err = NA_real_
  )
}

set.seed(69325)
data(Sacramento, package = "modeldata")
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
  mutate(correlations = map(splits, rank_corr)) %>%
  int_pctl(correlations)

rsample

General Resampling Infrastructure

v0.1.0
MIT + file LICENSE
Authors
Julia Silge [aut, cre] (<https://orcid.org/0000-0002-3671-836X>), Fanny Chow [aut], Max Kuhn [aut], Hadley Wickham [aut], RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.