Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

hotdeck

Hot-Deck Imputation


Description

Implementation of the popular Sequential, Random (within a domain) hot-deck algorithm for imputation.

Usage

hotdeck(
  data,
  variable = NULL,
  ord_var = NULL,
  domain_var = NULL,
  makeNA = NULL,
  NAcond = NULL,
  impNA = TRUE,
  donorcond = NULL,
  imp_var = TRUE,
  imp_suffix = "imp"
)

Arguments

data

data.frame or matrix

variable

variables where missing values should be imputed (not overlapping with ord_var)

ord_var

variables for sorting the data set before imputation (not overlapping with variable)

domain_var

variables for building domains and impute within these domains

makeNA

list of length equal to the number of variables, with values, that should be converted to NA for each variable

NAcond

list of length equal to the number of variables, with a condition for imputing a NA

impNA

TRUE/FALSE whether NA should be imputed

donorcond

list of length equal to the number of variables, with a donorcond condition for the donors e.g. ">5"

imp_var

TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status

imp_suffix

suffix for the TRUE/FALSE variables showing the imputation status

Value

the imputed data set.

Note

If the sequential hotdeck does not lead to a suitable, a random donor in the group will be used.

Author(s)

Alexander Kowarik

References

A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.

See Also

Other imputation methods: irmi(), kNN(), matchImpute(), rangerImpute(), regressionImp()

Examples

data(sleep)
sleepI <- hotdeck(sleep)
sleepI2 <- hotdeck(sleep,ord_var="BodyWgt",domain_var="Pred")

# Usage of donorcond in a simple example
sleepI3 <- hotdeck(
  sleep,
  variable = c("NonD", "Dream", "Sleep", "Span", "Gest"),
  ord_var = "BodyWgt", domain_var = "Pred",
  donorcond = list(">4", "<17", ">1.5", "%between%c(8,13)", ">5")
)

set.seed(132)
nRows <- 1e3
# Generate a data set with nRows rows and several variables
x <- data.frame(
  x = rnorm(nRows), y = rnorm(nRows),
  z = sample(LETTERS, nRows, replace = TRUE),
  d1 = sample(LETTERS[1:3], nRows, replace = TRUE),
  d2 = sample(LETTERS[1:2], nRows, replace = TRUE),
  o1 = rnorm(nRows), o2 = rnorm(nRows), o3 = rnorm(100)
)
origX <- x
x[sample(1:nRows,nRows/10), 1] <- NA
x[sample(1:nRows,nRows/10), 2] <- NA
x[sample(1:nRows,nRows/10), 3] <- NA
x[sample(1:nRows,nRows/10), 4] <- NA
xImp <- hotdeck(x,ord_var = c("o1", "o2", "o3"), domain_var = "d2")

VIM

Visualization and Imputation of Missing Values

v6.1.0
GPL (>= 2)
Authors
Matthias Templ [aut, cre], Alexander Kowarik [aut] (<https://orcid.org/0000-0001-8598-4130>), Andreas Alfons [aut], Gregor de Cillia [aut], Bernd Prantner [ctb], Wolfgang Rannetbauer [aut]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.