mi: 02missing_data.frame – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

02missing_data.frame

Class "missing_data.frame"

Description

This class is similar to a data.frame but is customized for the situation in which variables with missing data are being modeled for multiple imputation. This class primarily consists of a list of missing_variables plus slots containing metadata indicating how the missing_variables relate to each other. Most operations that work for a data.frame also work for a missing_data.frame.

Usage

missing_data.frame(y, ...)
## Hidden arguments not included in the signature
## favor_ordered = TRUE, favor_positive = FALSE, 
## subclass = NA_character_,
## include_missingness = TRUE, skip_correlation_check = FALSE

Arguments

y

Usually a data.frame, possibly a numeric matrix, possibly a list of missing_variables.

...

Hidden arguments. The favor_ordered and favor_positive arguments are passed to the missing_variable function and are documented under the type argument. Briefly, they affect the heuristics that are used to guess what class a variable should be coerced to. The subclass argument defaults to NA and can be used to specify that the resulting object should inherit from the missing_data.frame class rather than be an object of missing_data.frame class.

Any further arguments are passed to the initialize-methods for a missing_data.frame. They currently are include_missingness, which defaults to TRUE and indicates that the missingness pattern of the other variables should be included when modeling a particular missing_variable, and skip_correlation_check, which defaults to FALSE and indicates whether to skip the default check for whether the observed values of each pair of missing_variables has a perfect absolute Spearman correlation.

Details

In most cases, the first step of an analysis is for a useR to call the missing_data.frame function on a data.frame whose variables have some NA values, which will call the missing_variable function on each column of the data.frame and return the list that fills the variable slot. The classes of the list elements will depend on the nature of the column of the data.frame and various fallible heuristics. The success rate can be enhanced by making sure that columns of the original data.frame that are intended to be categorical variables are (ordered if appropriate) factors with labels. Even in the best case scenario, it will often be necessary to utlize the change function to modify various discretionary aspects of the missing_variables in the variables slot of the missing_data.frame. The show method for a missing_data.frame should be utilized to get a quick overview of the missing_variables in a missing_data.frame and recognized what needs to be changed.

Value

The missing_data.frame constructor function returns an object of class missing_data.frame or that inherits from the missing_data.frame class.

Objects from the Class

Objects can be created by calls of the form new("missing_data.frame", ...). However, useRs almost always will pass a data.frame to the missing_data.frame constructor function to produce an object of missing_data.frame class.

Slots

This section is primarily aimed at developeRs. A missing_data.frame inherits from data.frame but has the following additional slots:

variables:: Object of class "list" and each list element is an object that inherits from the missing_variable-class
no_missing:: Object of class "logical", which is a vector whose length is the same as the length of the variables slot indicating whether the corresponding missing_variable is fully observed
patterns:: Object of class factor whose length is equal to the number of observation and whose elements indicate the missingness pattern for that observation
DIM:: Object of class "integer" of length two indicating first the number of observations and second the length of the variables slot
DIMNAMES:: Object of class "list" of length two providing the appropriate number rownames and column names
postprocess:: Object of class "function" used to create additional variables from existing variables, such as interactions between two missing_variables once their missing values have been imputed. Does not work at the moment
index:: Object of class "list" whose length is equal to the number of missing_variables with some missing values. Each list element is an integer vector indicating which columns of the X slot must be dropped when modeling the corresponding missing_variable
X:: Object of MatrixTypeThing-class with rows equal to the number of observations and is loosely related to a model.matrix. Rather than repeatedly parsing a formula during the multiple imputation process, this X matrix is created once and some of its columns are dropped when modeling a missing_variable utilizing the index slot. The columns of the X matrix consists of numeric representations of the missing_variables plus (by default) the unique missingness patterns
weights:: Object of class "list" whose length is equal to one or the number of missing_variables with some missing values. Each list element is passed to the corresponding argument of bayesglm and similar functions. In particular, some observations can be given a weight of zero, which should drop them when modeling some missing_variables
priors:: Object of class "list" whose length is equal to the number of missing_variables and whose elements give appropriate values for the priors used by the model fitting function wraped by the fit_model-methods; see, e.g., bayesglm
correlations:: Object of class "matrix" with rows and columns equal to the length of the variables slot. Its strict upper triangle contains Spearman correlations between pairs of variables (ignoring missing values), and its strict lower triangle contains Squared Multiple Correlations (SMCs) between a variable and all other variables (ignoring missing values). If either a Spearman correlation or a SMC is very close to unity, there may be difficulty or error messages during the multiple imputation process.
done:: Object of class "logical" of length one indicating whether the missing values have been imputed
workpath:: Object of class character of length one indicating the path to a working directory that is used to store some objects

Methods

There are many methods that are defined for a missing_data.frame, although some are primarily intended for developers. The most relevant ones for users are:

change: signature(data = "missing_data.frame", y = "ANY", what = "character", to = "ANY") which is used to change discretionary aspects of the missing_variables in the variables slot of a missing_data.frame
hist: signature(x = "missing_data.frame") which shows histograms of the observed variables that have missingness
image: signature(x = "missing_data.frame") which plots an image of the missingness slot to visualize the pattern of missingness when grayscale = FALSE or the pattern of missingness in light of the observed values (grayscale = TRUE, the default)
mi: signature(y = "missing_data.frame", model = "missing") which multiply imputes the missing values
show: signature(object = "missing_data.frame") which gives an overview of the salient characteristics of the missing_variables in the variables slot of a missing_data.frame
summary: signature(object = "missing_data.frame") which produces the same result as the summary method for a data.frame

There are also S3 methods for the dim, dimnames, and names generics, which allow functions like nrow, ncol, rownames, colnames, etc. to work as expected on missing_data.frames. Also, accessing and changing elements for a missing_data.frame mostly works the same way as for a data.frame

Author(s)

Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima, Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman.

Examples

# STEP 0: Get data
data(CHAIN, package = "mi")

# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)

# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")

# STEP 3: look deeper
summary(mdf)
hist(mdf)
image(mdf)

# STEP 4: impute
## Not run: 
imputations <- mi(mdf)

## End(Not run)

## An example with subsetting on a fully observed variable
data(nlsyV, package = "mi")
mdfs <- missing_data.frame(nlsyV, favor_positive = TRUE, favor_ordered = FALSE, by = "first")
mdfs <- change(mdfs, y = "momed", what = "type", to = "ord")
show(mdfs)

mi

Missing Data Imputation and Model Checking

v1.0

GPL (>= 2)

Authors

Andrew Gelman [ctb], Jennifer Hill [ctb], Yu-Sung Su [aut], Masanao Yajima [ctb], Maria Pittau [ctb], Ben Goodrich [cre, aut], Yajuan Si [ctb], Jon Kropko [aut]

Initial release

2015-04-16