Genetic Matching
This function finds optimal balance using multivariate matching where
a genetic search algorithm determines the weight each covariate is
given. Balance is determined by examining cumulative probability
distribution functions of a variety of standardized statistics. By
default, these statistics include t-tests and Kolmogorov-Smirnov
tests. A variety of descriptive statistics based on empirical-QQ
(eQQ) plots can also be used or any user provided measure of balance.
The statistics are not used to conduct formal hypothesis tests,
because no measure of balance is a monotonic function of bias and
because balance should be maximized without limit. The object
returned by GenMatch
can be supplied to the Match
function (via the Weight.matrix
option) to obtain causal
estimates. GenMatch
uses genoud
to
perform the genetic search. Using the cluster
option, one may
use multiple computers, CPUs or cores to perform parallel
computations.
GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=NULL, pop.size = 100, max.generations=100, wait.generations=4, hard.generation.limit=FALSE, starting.values=rep(1,ncol(X)), fit.func="pvals", MemoryMatrix=TRUE, exact=NULL, caliper=NULL, replace=TRUE, ties=TRUE, CommonSupport=FALSE, nboots=0, ks=TRUE, verbose=FALSE, distance.tolerance=1e-05, tolerance=sqrt(.Machine$double.eps), min.weight=0, max.weight=1000, Domains=NULL, print.level=2, project.path=NULL, paired=TRUE, loss=1, data.type.integer=FALSE, restrict=NULL, cluster=FALSE, balance=TRUE, ...)
Tr |
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. |
X |
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. |
BalanceMatrix |
A matrix containing the variables we wish
to achieve balance on. This is by default equal to |
estimand |
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect, and "ATC" is the sample average treatment effect for the controls. |
M |
A scalar for the number of matches which should be
found. The default is one-to-one matching. Also see the |
weights |
A vector the same length as |
pop.size |
Population Size. This is the number of individuals
|
max.generations |
Maximum Generations. This is the maximum
number of generations that |
wait.generations |
If there is no improvement in the objective
function in this number of generations, optimization will stop. The
other options controlling termination are |
hard.generation.limit |
This logical variable determines if the
|
starting.values |
This vector's length is equal to the number of variables in |
fit.func |
The balance metric |
MemoryMatrix |
This variable controls if
|
exact |
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is
provided, that logical value is applied to all covariates in
|
caliper |
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in |
replace |
A logical flag for whether matching should be done with
replacement. Note that if |
ties |
A logical flag for whether ties should be handled deterministically. By
default |
CommonSupport |
This logical flag implements the usual procedure
by which observations outside of the common support of a variable
(usually the propensity score) across treatment and control groups are
discarded. The |
nboots |
The number of bootstrap samples to be run for the
|
ks |
A logical flag for if the univariate bootstrap
Kolmogorov-Smirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
non-dichotomous variables. The bootstrap KS test is consistent even
for non-continuous variables. By default, the bootstrap KS test is
not used. To change this see the |
verbose |
A logical flag for whether details of each
fitness evaluation should be printed. Verbose is set to FALSE if
the |
distance.tolerance |
This is a scalar which is used to determine
if distances between two observations are different from zero. Values
less than |
tolerance |
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. |
min.weight |
This is the minimum weight any variable may be given. |
max.weight |
This is the maximum weight any variable may be given. |
Domains |
This is a |
print.level |
This option controls the level of printing. There
are four possible levels: 0 (minimal printing), 1 (normal), 2
(detailed), and 3 (debug). If level 2 is selected, |
project.path |
This is the path of the
|
paired |
A flag for whether the paired |
loss |
The loss function to be optimized. The default value, If the value of |
data.type.integer |
By default, floating-point weights are considered. If this option is
set to |
restrict |
A matrix which restricts the possible matches. This
matrix has one row for each restriction and three
columns. The first two columns contain the two observation numbers
which are to be restricted (for example 4 and 20), and the third
column is the restriction imposed on the observation-pair.
Negative numbers in the third column imply that the two observations
cannot be matched under any circumstances, and positive numbers are
passed on as the distance between the two observations for the
matching algorithm. The most commonly used positive restriction is
Exclusion restriction are even more common. For example, if we want
to exclude the observation pair 4 and 20 and the pair 6 and 55 from
being matched, the restrict matrix would be:
|
cluster |
This
can either be an object of the 'cluster' class returned by one of
the |
balance |
This logical flag controls if load balancing is done
across the cluster. Load balancing can result in better cluster
utilization; however, increased communication can reduce
performance. This option is best used if each individual call to
|
... |
Other options which are passed on to
|
value |
The fit
values at the solution. By default, this is a vector of p-values
sorted from the smallest to the largest. There will generally be
twice as many p-values as there are variables in
|
par |
A vector
of the weights given to each variable in |
Weight.matrix |
A matrix whose diagonal corresponds to the
weight given to each variable in |
matches |
A matrix where the first column contains the row
numbers of the treated observations in the matched dataset. The
second column contains the row numbers of the control
observations. And the third column contains the weight that each
matched pair is given. These objects may not correspond
respectively to the |
ecaliper |
The
size of the enforced caliper on the scale of the |
Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 1-52. doi: 10.18637/jss.v042.i07
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. http://sekhon.berkeley.edu/papers/GenMatch.pdf
Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. "Genetic Optimization Using Derivatives: Theory and Application to Nonlinear Models.” Political Analysis, 7: 187-210. http://sekhon.berkeley.edu/genoud/genoud.pdf
Also see Match
, summary.Match
,
MatchBalance
, genoud
,
balanceUV
, qqstats
,
ks.boot
, GerberGreenImai
, lalonde
data(lalonde) attach(lalonde) #The covariates we want to match on X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) #The covariates we want to obtain balance on BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74, I(re74*re75)) # #Let's call GenMatch() to find the optimal weight to give each #covariate in 'X' so as we have achieved balance on the covariates in #'BalanceMat'. This is only an example so we want GenMatch to be quick #so the population size has been set to be only 16 via the 'pop.size' #option. This is *WAY* too small for actual problems. #For details see http://sekhon.berkeley.edu/papers/MatchingJSS.pdf. # genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1, pop.size=16, max.generations=10, wait.generations=1) #The outcome variable Y=re78/1000 # # Now that GenMatch() has found the optimal weights, let's estimate # our causal effect of interest using those weights # mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout) summary(mout) # #Let's determine if balance has actually been obtained on the variables of interest # mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+ re75+ re74+ I(re74*re75), match.out=mout, nboots=500) # For more examples see: http://sekhon.berkeley.edu/matching/R.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.