Imputation under MNAR mechanism by NARFCS
Imputes univariate data under a user-specified MNAR mechanism by linear or logistic regression and NARFCS. Sensitivity analysis under different model specifications may shed light on the impact of different MNAR assumptions on the conclusions.
mice.impute.mnar.logreg(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...) mice.impute.mnar.norm(y, ry, x, wy = NULL, ums = NULL, umx = NULL, ...)
y |
Vector to be imputed |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
ums |
A string containing the specification of the unidentifiable part of the imputation model (the *unidentifiable model specification”), that is, the desired δ-adjustment (offset) as a function of other variables and values for the corresponding deltas (sensitivity parameters). See details. |
umx |
An auxiliary data matrix containing variables that do
not appear in the identifiable part of the imputation procedure
but that have been specified via |
... |
Other named arguments. |
This function imputes data that are thought to be Missing Not at
Random (MNAR) by the NARFCS method. The NARFCS procedure
(Leacy, 2016; Tompsett et al, 2018) generalises the so-called
δ-adjustment sensitivity analysis method of Van Buuren,
Boshuizen & Knook (1999) to the case with multiple incomplete
variables within the FCS framework. In practical terms, the
NARFCS procedure shifts the imputations drawn at each
iteration of mice
by a user-specified quantity that can
vary across subjects, to reflect systematic departures of the
missing data from the data distribution imputed under MAR.
Specification of the NARFCS model is done by the blots
argument of mice()
. The blots
parameter is a named
list. For each variable to be imputed by
mice.impute.mnar.norm()
or mice.impute.mnar.logreg()
the corresponding element in blots
is a list with
at least one argument ums
and, optionally, a second
argument umx
.
For example, the high-level call might like something like
mice(nhanes[, c(2, 4)], method = c("pmm", "mnar.norm"), blots = list(chl = list(ums = "-3+2*bmi")))
.
The ums
parameter is required, and might look like this:
"-4+1*Y"
. The ums
specifcation must have the
following characteristics:
A single term corresponding to the intercept (constant) term, not multiplied by any variable name, must be included in the expression;
Each term in the expression (corresponding to the intercept
or a predictor variable) must be separated by either a "+"
or "-"
sign, depending on the sign of the sensitivity
parameter;
Within each non-intercept term, the sensitivity parameter
value comes first and the predictor variable comes second, and these
must be separated by a "*"
sign;
For categorical predictors, for example a variable Z
with K + 1 categories ("Cat0","Cat1", ...,"CatK")
, K
category-specific terms are needed, and those not in umx
(see below) must be specified by concatenating the variable name
with the name of the category (e.g. ZCat1
) as this is how
they are named in the design matrix (argument x
) passed
to the univariate imputation function. An example is
"2+1*ZCat1-3*ZCat2"
.
If given, the umx
specification must have the following
characteristics:
It contains only complete variables, with no missing values;
It is a numeric matrix. In particular, categorical variables
must be represented as dummy indicators with names corresponding
to what is used in ums
to refer to the category-specific terms
(see above);
It has the same number of rows as the data
argument
passed on to the main mice
function;
It does not contain variables that were already predictors in the identifiable part of the model for the variable under imputation.
Limitation: The present implementation can only condition on variables
that appear in the identifiable part of the imputation model (x
) or
in complete auxiliary variables passed on via the umx
argument.
It is not possible to specify models where the offset depends on
incomplete auxiliary variables.
For an MNAR alternative see also mice.impute.ri
.
Vector with imputed data, same type as y
, and of length
sum(wy)
Margarita Moreno-Betancur, Stef van Buuren, Ian R. White, 2020.
Leacy, F.P. (2016). Multiple imputation under missing not at random assumptions via fully conditional specification. Dissertation, University of Cambridge, UK.
Tompsett, D. M., Leacy, F., Moreno-Betancur, M., Heron, J., & White, I. R. (2018). On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Statistics in Medicine, 37(15), 2338-2353. doi: 10.1002/sim.7643.
Van Buuren, S., Boshuizen, H.C., Knook, D.L. (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681–694.
Other univariate imputation functions:
mice.impute.cart()
,
mice.impute.lda()
,
mice.impute.logreg.boot()
,
mice.impute.logreg()
,
mice.impute.mean()
,
mice.impute.midastouch()
,
mice.impute.norm.boot()
,
mice.impute.norm.nob()
,
mice.impute.norm.predict()
,
mice.impute.norm()
,
mice.impute.pmm()
,
mice.impute.polr()
,
mice.impute.polyreg()
,
mice.impute.quadratic()
,
mice.impute.rf()
,
mice.impute.ri()
# 1: Example with no auxiliary data: only pass unidentifiable model specification (ums) # Specify argument to pass on to mnar imputation functions via "blots" argument mnar.blot <- list(X = list(ums = "-4"), Y = list(ums = "2+1*ZCat1-3*ZCat2")) # Run NARFCS by using mnar imputation methods and passing argument via blots impNARFCS <- mice(mnar_demo_data, method = c("mnar.logreg", "mnar.norm", ""), blots = mnar.blot, seed = 234235, print = FALSE ) # Obtain MI results: Note they coincide with those from old version at # https://github.com/moreno-betancur/NARFCS pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate # 2: Example passing also auxiliary data to MNAR procedure (umx) # Assumptions: # - Auxiliary data are complete, no missing values # - Auxiliary data are a numeric matrix # - Auxiliary data have same number of rows as x # - Auxiliary data have no overlapping variable names with x # Specify argument to pass on to mnar imputation functions via "blots" argument aux <- matrix(0:1, nrow = nrow(mnar_demo_data)) dimnames(aux) <- list(NULL, "even") mnar.blot <- list( X = list(ums = "-4"), Y = list(ums = "2+1*ZCat1-3*ZCat2+0.5*even", umx = aux) ) # Run NARFCS by using mnar imputation methods and passing argument via blots impNARFCS <- mice(mnar_demo_data, method = c("mnar.logreg", "mnar.norm", ""), blots = mnar.blot, seed = 234235, print = FALSE ) # Obtain MI results: As expected they differ (slightly) from those # from old version at https://github.com/moreno-betancur/NARFCS pool(with(impNARFCS, lm(Y ~ X + Z)))$pooled$estimate
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.