K-fold cross-validation
The kfold
method performs exact K-fold cross-validation. First
the data are randomly partitioned into K subsets of equal size (or as close
to equal as possible), or the user can specify the folds
argument
to determine the partitioning. Then the model is refit K times, each time
leaving out one of the K subsets. If K is equal to the total
number of observations in the data then K-fold cross-validation is
equivalent to exact leave-one-out cross-validation (to which
loo
is an efficient approximation).
## S3 method for class 'stanreg' kfold( x, K = 10, ..., folds = NULL, save_fits = FALSE, cores = getOption("mc.cores", 1) )
x |
A fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects. |
K |
For |
... |
Currently ignored. |
folds |
For |
save_fits |
For |
cores |
The number of cores to use for parallelization. Instead fitting
separate Markov chains for the same model on different cores, by default
|
An object with classes 'kfold' and 'loo' that has a similar structure
as the objects returned by the loo
and waic
methods and is compatible with the loo_compare
function for
comparing models.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: http://arxiv.org/abs/1507.04544/
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).
fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0) fit3 <- stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars, refresh = 0) # 10-fold cross-validation # (if possible also specify the 'cores' argument to use multiple cores) (kfold1 <- kfold(fit1, K = 10)) kfold2 <- kfold(fit2, K = 10) kfold3 <- kfold(fit3, K = 10) loo_compare(kfold1, kfold2, kfold3) # stratifying by a grouping variable # (note: might get some divergences warnings with this model but # this is just intended as a quick example of how to code this) fit4 <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars, refresh = 0) table(mtcars$cyl) folds_cyl <- loo::kfold_split_stratified(K = 3, x = mtcars$cyl) table(cyl = mtcars$cyl, fold = folds_cyl) kfold4 <- kfold(fit4, folds = folds_cyl, cores = 2) print(kfold4) # Example code demonstrating the different ways to specify the number # of cores and how the cores are used # # options(mc.cores = NULL) # # # spread the K models over N_CORES cores (method 1) # kfold(fit, K, cores = N_CORES) # # # spread the K models over N_CORES cores (method 2) # options(mc.cores = N_CORES) # kfold(fit, K) # # # fit K models sequentially using N_CORES cores for the Markov chains each time # options(mc.cores = N_CORES) # kfold(fit, K, cores = 1)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.