Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

parlmice

Wrapper function that runs MICE in parallel


Description

This is a wrapper function for mice, using multiple cores to execute mice in parallel. As a result, the imputation procedure can be sped up, which may be useful in general.

Usage

parlmice(
  data,
  m = 5,
  seed = NA,
  cluster.seed = NA,
  n.core = NULL,
  n.imp.core = NULL,
  cl.type = "PSOCK",
  ...
)

Arguments

data

A data frame or matrix containing the incomplete data. Similar to the first argument of mice.

m

The number of desired imputated datasets. By default $m=5$ as with mice

seed

A scalar to be used as the seed value for the mice algorithm within each parallel stream. Please note that the imputations will be the same for all streams and, hence, this should be used if and only if n.core = 1 and if it is desired to obtain the same output as under mice.

cluster.seed

A scalar to be used as the seed value. It is recommended to put the seed value here and not outside this function, as otherwise the parallel processes will be performed with separate, random seeds.

n.core

A scalar indicating the number of cores that should be used.

n.imp.core

A scalar indicating the number of imputations per core.

cl.type

The cluster type. Default value is "PSOCK". Posix machines (linux, Mac) generally benefit from much faster cluster computation if type is set to type = "FORK".

...

Named arguments that are passed down to function mice or makeCluster.

Details

This function relies on package parallel, which is a base package for R versions 2.14.0 and later. We have chosen to use parallel function parLapply to allow the use of parlmice on Mac, Linux and Windows systems. For the same reason, we use the Parallel Socket Cluster (PSOCK) type by default.

On systems other than Windows, it can be hugely beneficial to change the cluster type to FORK, as it generally results in improved memory handling. When memory issues arise on a Windows system, we advise to store the multiply imputed datasets, clean the memory by using rm and gc and make another run using the same settings.

This wrapper function combines the output of parLapply with function ibind in mice. A mids object is returned and can be used for further analyses.

Note that if a seed value is desired, the seed should be entered to this function with argument seed. Seed values outside the wrapper function (in an R-script or passed to mice) will not result to reproducible results. We refer to the manual of parallel for an explanation on this matter.

Value

A mids object as defined by mids-class

Author(s)

Gerko Vink, Rianne Schouten

References

Schouten, R. and Vink, G. (2017). parlmice: faster, paraleller, micer. https://www.gerkovink.com/parlMICE/Vignette_parlMICE.html

#'Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.

See Also

Examples

# 150 imputations in dataset nhanes, performed by 3 cores
## Not run: 
imp1 <- parlmice(data = nhanes, n.core = 3, n.imp.core = 50)
# Making use of arguments in mice.
imp2 <- parlmice(data = nhanes, method = "norm.nob", m = 100)
imp2$method
fit <- with(imp2, lm(bmi ~ hyp))
pool(fit)

## End(Not run)

mice

Multivariate Imputation by Chained Equations

v3.13.0
GPL-2 | GPL-3
Authors
Stef van Buuren [aut, cre], Karin Groothuis-Oudshoorn [aut], Gerko Vink [ctb], Rianne Schouten [ctb], Alexander Robitzsch [ctb], Patrick Rockenschaub [ctb], Lisa Doove [ctb], Shahab Jolani [ctb], Margarita Moreno-Betancur [ctb], Ian White [ctb], Philipp Gaffert [ctb], Florian Meinfelder [ctb], Bernie Gray [ctb], Vincent Arel-Bundock [ctb]
Initial release
2021-01-26

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.