Find index of matched donor units
Find index of matched donor units
matchindex(d, t, k = 5L)
d |
Numeric vector with values from donor cases. |
t |
Numeric vector with values from target cases. |
k |
Integer, number of unique donors from which a random draw is made.
For |
For each element in t
, the method finds the k
nearest
neighbours in d
, randomly draws one of these neighbours, and
returns its position in vector d
.
Fast predictive mean matching algorithm in seven steps:
1. Shuffle records to remove effects of ties
2. Obtain sorting order on shuffled data
3. Calculate index on input data and sort it
4. Pre-sample vector h
with values between 1 and k
For each of the n0
elements in t
:
5. find the two adjacent neighbours
6. find the h_i
'th nearest neighbour
7. store the index of that neighbour
Return vector of n0
positions in d
.
We may use the function to perform predictive mean matching under a given
predictive model. To do so, specify both d
and t
as
predictions from the same model. Suppose that y
contains the observed
outcomes of the donor cases (in the same sequence as d
), then
y[matchindex(d, t)]
returns one matched outcome for every
target case.
See https://github.com/amices/mice/issues/236.
This function is a replacement for the matcher()
function that has
been in default in mice
since version 2.22
(June 2014).
An integer vector with length(t)
elements. Each
element is an index in the array d
.
Stef van Buuren, Nasinski Maciej, Alexander Robitzsch
set.seed(1) # Inputs need not be sorted d <- c(-5, 5, 0, 10, 12) t <- c(-6, -4, 0, 2, 4, -2, 6) # Index (in vector a) of closest match idx <- matchindex(d, t, 1) idx # To check: show values of closest match # Random draw among indices of the 5 closest predictors matchindex(d, t) # An example train <- mtcars[1:20, ] test <- mtcars[21:32, ] fit <- lm(mpg ~ disp + cyl, data = train) d <- fitted.values(fit) t <- predict(fit, newdata = test) # note: not using mpg idx <- matchindex(d, t) # Borrow values from train to produce 12 synthetic values for mpg in test. # Synthetic values are plausible values that could have been observed if # they had been measured. train$mpg[idx] # Exercise: Create a distribution of 1000 plausible values for each of the # twelve mpg entries in test, and count how many times the true value # (which we know here) is located within the inter-quartile range of each # distribution. Is your count anywhere close to 500? Why? Why not?
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.