Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

matchindex

Find index of matched donor units


Description

Find index of matched donor units

Usage

matchindex(d, t, k = 5L)

Arguments

d

Numeric vector with values from donor cases.

t

Numeric vector with values from target cases.

k

Integer, number of unique donors from which a random draw is made. For k = 1 the function returns the index in d corresponding to the closest unit. For multiple imputation, the advice is to set values in the range of k = 5 to k = 10.

Details

For each element in t, the method finds the k nearest neighbours in d, randomly draws one of these neighbours, and returns its position in vector d.

Fast predictive mean matching algorithm in seven steps:

1. Shuffle records to remove effects of ties

2. Obtain sorting order on shuffled data

3. Calculate index on input data and sort it

4. Pre-sample vector h with values between 1 and k

For each of the n0 elements in t:

5. find the two adjacent neighbours

6. find the h_i'th nearest neighbour

7. store the index of that neighbour

Return vector of n0 positions in d.

We may use the function to perform predictive mean matching under a given predictive model. To do so, specify both d and t as predictions from the same model. Suppose that y contains the observed outcomes of the donor cases (in the same sequence as d), then y[matchindex(d, t)] returns one matched outcome for every target case.

See https://github.com/amices/mice/issues/236. This function is a replacement for the matcher() function that has been in default in mice since version 2.22 (June 2014).

Value

An integer vector with length(t) elements. Each element is an index in the array d.

Author(s)

Stef van Buuren, Nasinski Maciej, Alexander Robitzsch

Examples

set.seed(1)

# Inputs need not be sorted
d <- c(-5, 5, 0, 10, 12)
t <- c(-6, -4, 0, 2, 4, -2, 6)

# Index (in vector a) of closest match
idx <- matchindex(d, t, 1)
idx

# To check: show values of closest match

# Random draw among indices of the 5 closest predictors
matchindex(d, t)

# An example
train <- mtcars[1:20, ]
test <- mtcars[21:32, ]
fit <- lm(mpg ~ disp + cyl, data = train)
d <- fitted.values(fit)
t <- predict(fit, newdata = test)  # note: not using mpg
idx <- matchindex(d, t)

# Borrow values from train to produce 12 synthetic values for mpg in test.
# Synthetic values are plausible values that could have been observed if
# they had been measured.
train$mpg[idx]

# Exercise: Create a distribution of 1000 plausible values for each of the
# twelve mpg entries in test, and count how many times the true value
# (which we know here) is located within the inter-quartile range of each
# distribution. Is your count anywhere close to 500? Why? Why not?

mice

Multivariate Imputation by Chained Equations

v3.13.0
GPL-2 | GPL-3
Authors
Stef van Buuren [aut, cre], Karin Groothuis-Oudshoorn [aut], Gerko Vink [ctb], Rianne Schouten [ctb], Alexander Robitzsch [ctb], Patrick Rockenschaub [ctb], Lisa Doove [ctb], Shahab Jolani [ctb], Margarita Moreno-Betancur [ctb], Ian White [ctb], Philipp Gaffert [ctb], Florian Meinfelder [ctb], Bernie Gray [ctb], Vincent Arel-Bundock [ctb]
Initial release
2021-01-26

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.