Outlier and Influential Case Diagnostics for 'rma.mv' Objects
The functions compute various outlier and influential case diagnostics (some of which indicate the influence of deleting one case/study at a time on the model fit and the fitted/residual values) for objects of class "rma.mv"
.
## S3 method for class 'rma.mv' cooks.distance(model, progbar=FALSE, cluster, reestimate=TRUE, parallel="no", ncpus=1, cl=NULL, ...) ## S3 method for class 'rma.mv' dfbetas(model, progbar=FALSE, cluster, reestimate=TRUE, parallel="no", ncpus=1, cl=NULL, ...) ## S3 method for class 'rma.mv' hatvalues(model, type="diagonal", ...)
model |
an object of class |
progbar |
logical indicating whether a progress bar should be shown (the default is |
cluster |
optional vector specifying a clustering variable to use for computing the Cook's distances. If not specified, Cook's distances are computed for all individual observed outcomes. |
reestimate |
logical indicating whether variance/correlation components should be re-estimated after deletion of the ith study/cluster (the default is |
parallel |
character string indicating whether parallel processing should be used (the default is |
ncpus |
integer specifying the number of processes to use in the parallel processing. |
cl |
optional snow cluster to use if |
type |
character string indicating whether to return only the diagonal of the hat matrix ( |
... |
other arguments. |
Cook's distance for the ith study/cluster can be interpreted as the Mahalanobis distance between the entire set of predicted values once with the ith study/cluster included and once with the ith study/cluster excluded from the model fitting.
The DFBETAS value(s) essentially indicate(s) how many standard deviations the estimated coefficient(s) change(s) after excluding the ith study/cluster from the model fitting.
The cooks.distance
function returns a vector. The dfbetas
function returns a data frame. The hatvalues
function returns either a vector with the diagonal elements of the hat matrix or the entire hat matrix.
Right now, leave-one-out diagnostics are calculated by refitting the model k times (where k is the number of studies/clusters). Depending on how large k is, it may take a few moments to finish the calculations. For complex models fitted with rma.mv
, this can become computationally expensive.
On machines with multiple cores, one can usually speed things up by delegating the model fitting to separate worker processes, that is, by setting parallel="snow"
or parallel="multicore"
and ncpus
to some value larger than 1. Parallel processing makes use of the parallel
package, using the makePSOCKcluster
and parLapply
functions when parallel="snow"
or using mclapply
when parallel="multicore"
(the latter only works on Unix/Linux-alikes). With parallel::detectCores()
, one can check on the number of available cores on the local machine.
Alternatively (or in addition to using parallel processing), one can also set reestimate=FALSE
, in which case any variance/correlation components in the model are not re-estimated after deleting the ith study/cluster from the dataset. Doing so only yields an approximation to the Cook's distances and DFBETAS values that ignores the influence of the ith study/cluster on the variance/correlation components, but is considerably faster (and often yields similar results).
It may not be possible to fit the model after deletion of the ith study/cluster from the dataset. This will result in NA
values for that study/cluster.
Wolfgang Viechtbauer wvb@metafor-project.org http://www.metafor-project.org/
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics. New York: Wiley.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman and Hall.
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. https://www.jstatsoft.org/v036/i03.
Viechtbauer, W., & Cheung, M. W.-L. (2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1, 112–125.
### copy data from Konstantopoulos (2011) into 'dat' dat <- dat.konstantopoulos2011 ### multilevel random-effects model res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat) print(res, digits=3) ### Cook's distances for each observed outcome x <- cooks.distance(res) x plot(x, type="o", pch=19, xlab="Observed Outcome", ylab="Cook's Distance") ### Cook's distances for each district x <- cooks.distance(res, cluster=dat$district) x plot(x, type="o", pch=19, xlab="District", ylab="Cook's Distance", xaxt="n") axis(side=1, at=seq_along(x), labels=as.numeric(names(x))) ### hat values hatvalues(res)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.