Generalized Propensity Score Estimation using GBM
ps.cont calculates generalized propensity scores and corresponding weights using boosted linear regression as implemented in gbm. This function extends ps in twang to continuous treatments. The syntax and output are largely the same. The GBM parameter defaults are those found in Zhu, Coffman, & Ghosh (2015).
Note: ps.cont will phased out when twang adds functionality for continuous treatments. All functionality and more is already present in weightit with method_gbm[method = "gbm"].
ps.cont(formula, data,
n.trees = 20000,
interaction.depth = 4,
shrinkage = 0.0005,
bag.fraction = 1,
print.level = 0,
verbose = FALSE,
stop.method,
sampw = NULL,
optimize = 1,
use.kernel = FALSE,
...)
## S3 method for class 'ps.cont'
summary(object, ...)
## S3 method for class 'ps.cont'
plot(x, ...)
## S3 method for class 'ps.cont'
boxplot(x, ...)formula |
A formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side. |
data |
The dataset in the form of a data frame, which should include treatment assignment as well as the covariates specified in |
n.trees |
The number of GBM iterations passed on to |
interaction.depth |
The |
shrinkage |
The |
bag.fraction |
The |
print.level |
Currently ignored. |
verbose |
If |
stop.method |
A method or methods of measuring and summarizing balance across pretreatment variables. Current options are |
sampw |
Optional sampling weights. |
optimize |
A numeric value, either |
use.kernel |
Whether to use kernel density estimation as implemented in |
object, x |
A |
... |
For For |
ps.cont extends ps in twang to continuous treatments. It estimates weights from a series of trees and then outputs the weights that optimize a user-set criterion. The criterion employed involves the correlation between the treatment and each covariate. In a fully balanced sample, the treatment will have a correlation of 0 with covariates sufficient for removing confounding. Zhu, Coffman, & Ghosh (2015), who were the first to describe GBM for propensity score weighting with continuous treatments, recommend this procedure and provided R code to implement the methods they describe. ps.cont adapts their syntax to make it consistent with that of ps in twang. As in Zhu et al. (2015), when the Pearson correlation is requested, weighted biserial correlations will be computed for binary covariates.
The weights are estimated as the marginal density of the treatment divided by the conditional density of the treatment on the covariates for each unit. For the marginal density, a kernel density estimator can be implemented using the density function. For the conditional density, a Gaussian density is assumed. Note that with treatment with outlying values, extreme weights can be produced, so it is important to examine the weights and trim them if necessary.
It is recommended to use as many trees as possible, though this requires more computation time, especially with use.optimize set to 0. There is little difference between using Pearson and Spearman correlations or between using the raw correlations and the Z-transformed correlations. Typically the only gbm-related options that should be changed are the interaction depth and number of trees.
Missing data is not allowed in the covariates because of the ambiguity in computing correlations with missing values.
summary.ps.cont compresses the information in the desc component of the ps.cont object into a short summary table describing the size of the dataset and the quality of the generalized propensity score weights, in a similar way to summary.ps.
plot.ps.cont and boxplot.ps.cont function almost identically to plot.ps and boxplot.ps. See the help pages there for more information. Note that for plot.ps, only options 1, 2, and 6 are available for the plots argument. When use.optimize = 2, option 1 is not available.
Returns an object of class ps and ps.cont, a list containing
gbm.obj |
The returned |
treat |
The treatment variable. |
desc |
a list containing balance tables for each method selected in
|
ps |
a data frame containing the estimated generalized propensity scores. Each column is associated with one of the methods selected in |
w |
a data frame containing the propensity score weights. Each column is associated with one of the methods selected in |
estimand |
|
datestamp |
Records the date of the analysis. |
parameters |
Saves the |
alerts |
|
iters |
A sequence of iterations used in the GBM fits used by |
balance |
The balance summary for each tree examined, with a column for each stop.method. If |
n.trees |
Maximum number of trees considered in GBM fit. |
data |
Data as specified in the |
The NULL entries exist so the output object is similar to that of ps in twang.
Noah Greifer
ps.cont is heavily adapted from the R code in Zhu, Coffman, & Ghosh (2015). In contrast with their code, ps.cont uses weighted Pearson and Spearman correlations rather than probability weighted bootstrapped correlations, allows for different degrees of optimization in searching for the best solution, and allows for the use of kernel density estimation for the generalized propensity score. ps.cont also takes inspiration from ps in twang.
Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments. Journal of Causal Inference, 3(1). doi: 10.1515/jci-2014-0022
weightit and method_gbm for its implementation using weightit syntax.
gbm for the underlying machinery and explanation of the parameters.
# Examples take a long time
## Not run:
library("cobalt")
data("lalonde", package = "cobalt")
#Balancing covariates with respect to re75
psc.out <- ps.cont(re75 ~ age + educ + married +
nodegree + race + re74, data = lalonde,
stop.method = c("p.mean", "p.max"),
use.optimize = 2)
summary(psc.out)
twang::bal.table(psc.out) #twang's bal.table
## End(Not run)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.