Estimation of prediction variance with bootstrap correction
This function is similar to get_predVar
except that is uses a bootstrap procedure to correct for bias in the evaluation of the prediction variance.
get_cPredVar(pred_object, newdata, nsim, seed, type = "residual", variances=NULL, nb_cores = NULL, fit_env = NULL, sim_object=pred_object)
pred_object |
an object of class |
newdata |
passed to |
nsim |
passed to |
seed |
passed to |
type |
passed to |
variances |
NULL or list; |
nb_cores |
integer: number of cores to use for parallel computation of bootstrap. The default is |
fit_env |
For parallel computations: an environment containing objects to be passed to the cores. They should have the same name in |
sim_object |
an object of class |
The result provided by get_cPredVar
is similar to the CMSEP (Conditional Mean Standard Error of Prediction) introduced by Booth and Hobert (1998; “B&H”). This paper is known for pointing the importance of using conditional variances when they differ from unconditional ones. This is hard to miss in spatial models, where the relevant prediction variance typically depends on the variance of random effects conditional on the data. Thus, the alternative function get_predVar
already accounts for this and returns a prediction variance that depends on a joint covariance of fixed-effect estimates and of random effects given the data.
B&H also used a conditional bootstrap procedure to correct for some bias. get_cPredVar
implements a similar procedure, in contrast to get_predVar
. Their conditional bootstrap procedure is not applicable for autocorrelated random effects, and parametric bootstrapping of the residuals of the fitted model (as implied by the default value of argument type
) is used instead here. Apart from this difference, the returned value includes exactly the same terms as those discussed by B&H: their “naive estimate” ν_i and its bootstrap correction b_i, their correction β for uncertainty in fixed-effect coefficients, and their correction σ^2 for uncertainty in dispersion parameters.
This use of the bootstrap does not account for uncertainty in correlation parameters “outer-optimized” by fitme
or corrHLfit
, because the correlation parameters are fixed when the model is refitted on the bootstrap replicates. Even if it the correlation parameters where refitted, the full computation would not be sufficient to account for uncertainty in them. To account for uncertainty in correlation parameters, one should rather perform a parametric bootstrap of the full model (typically using spaMM_boot(., type="residual")
), which may take much more time.
The “naive estimate” ν_i is not generally an estimate of anything uniquely defined by the model parameters: for correlated random effects, it depends on the “root” of the correlation matrix of the random effects, which is not unique. Thus ν_i is not unique, and may differ for example for equivalent fits by sparse-precision methods vs. other methods. Nevertheless, attr(cpredvar,"info")$naive
does recover published values in the Examples below, as they involve no correlation matrix.
A vector of prediction variances, with an attribute info
which is an environment containing variables:
SEs |
the standard errors of the estimates (which are those of the bootstrap replicates) |
bias |
the bias term |
maive |
B&H's “naive” ν_i |
Booth, J.G., Hobert, J.P. (1998) Standard errors of prediction in generalized linear mixed models. J. Am. Stat. Assoc. 93: 262-272.
## Not run: if(requireNamespace("rsae", quietly = TRUE)) { # LMM example from Booth & Hobert 1998 JASA data("landsat", package = "rsae") fitCorn <- fitme(HACorn ~ PixelsCorn + PixelsSoybeans + (1|CountyName),data=landsat[-33,]) newXandZ <- unique(data.frame(PixelsCorn=landsat$MeanPixelsCorn, PixelsSoybeans=landsat$MeanPixelsSoybeans, CountyName=landsat$CountyName)) (cpredvar <- get_cPredVar(fitCorn, newdata=newXandZ, nsim=200L, seed=123)) # serial computation (cpredvar <- get_cPredVar(fitCorn, newdata=newXandZ, nsim=200L, seed=123, nb_cores=parallel::detectCores()-1L, fit_env=list2env(list(newXandZ=newXandZ)))) } # GLMM example from Booth & Hobert 1998 JASA npos <- c(11,16,14,2,6,1,1,4,10,22,7,1,0,0,1,6) ntot <- c(36,20,19,16,17,11,5,6,37,32,19,17,12,10,9,7) treatment <- c(rep(1,8),rep(0,8)) clinic <-c(seq(8),seq(8)) clinics <- data.frame(npos=npos,nneg=ntot-npos,treatment=treatment,clinic=clinic) # fitClinics <- HLfit(cbind(npos,nneg)~treatment+(1|clinic),family=binomial(),data=clinics) # (get_cPredVar(fitClinics, newdata=clinics[1:8,], nsim=200L, seed=123)) # serial computation (get_cPredVar(fitClinics, newdata=clinics[1:8,], nsim=200L, seed=123, nb_cores=parallel::detectCores()-1, fit_env=list2env(list(clinics=clinics)))) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.