WeightIt: ps.cont – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

ps.cont

Generalized Propensity Score Estimation using GBM

Description

ps.cont calculates generalized propensity scores and corresponding weights using boosted linear regression as implemented in gbm. This function extends ps in twang to continuous treatments. The syntax and output are largely the same. The GBM parameter defaults are those found in Zhu, Coffman, & Ghosh (2015).

Note: ps.cont will phased out when twang adds functionality for continuous treatments. All functionality and more is already present in weightit with method_gbm[method = "gbm"].

Usage

ps.cont(formula, data,
        n.trees = 20000,
        interaction.depth = 4,
        shrinkage = 0.0005,
        bag.fraction = 1,
        print.level = 0,
        verbose = FALSE,
        stop.method,
        sampw = NULL,
        optimize = 1,
        use.kernel = FALSE,
        ...)
## S3 method for class 'ps.cont'
summary(object, ...)
## S3 method for class 'ps.cont'
plot(x, ...)
## S3 method for class 'ps.cont'
boxplot(x, ...)

Arguments

`formula`	A formula for the propensity score model with the treatment indicator on the left side of the formula and the potential confounding variables on the right side.
`data`	The dataset in the form of a data frame, which should include treatment assignment as well as the covariates specified in `formula`.
`n.trees`	The number of GBM iterations passed on to `gbm`. The more, the better the final solution will be, but the more time it will take.
`interaction.depth`	The `interaction.depth` passed on to `gbm`.
`shrinkage`	The `shrinkage` passed on to `gbm`.
`bag.fraction`	The `bag.fraction` passed on to `gbm`.
`print.level`	Currently ignored.
`verbose`	If `TRUE`, information will be printed to monitor the the progress of the fitting.
`stop.method`	A method or methods of measuring and summarizing balance across pretreatment variables. Current options are `p.max`, `p.mean`, `p.rms`, `s.max`, `s.mean`, and `s.rms`. `p` refers to the Pearson correlation and `s` refers to the Spearman correlation. These are summarized across the pretreatment variables by the maximum (`max`), the mean (`mean`), or the square root of the mean of the squares (`rms`).
`sampw`	Optional sampling weights.
`optimize`	A numeric value, either `0`, `1`, or `2`. If `0`, balance will be checked for every tree, and the tree with the best balance will be the one used to generate the final weights. If `1`, the default, balance will be checked for a subset of trees, and then `optimize` will be used to find the tree with the best balance within the tree interval chosen. If `2`, `optimize` will be used to find the tree that yields the best balance. `0` takes the longest but is guaranteed to find the best balance among the trees. `2` is the quickest but will often choose a tree that that suboptimal balance, though not by much. `1` is a compromise between speed and comprehensiveness and is the algorithm implemented in twang.
`use.kernel`	Whether to use kernel density estimation as implemented in `density` to estimate the numerator of the weights. If `TRUE`, `density` will be used. If `FALSE`, the default, a normal density will be assumed and will be estimated using `dnorm()`.
`object, x`	A `ps.cont` object.
`...`	For `ps.cont`, if `use.density = TRUE`, additional arguments to `density`, which is used to produce the density for the numerator of the weights. These include `bw`, `adjust`, `kernel`, and `n`. The default values are the defaults for `density`, except `n`, which is 10 times the number of units. For `summary.ps.cont`, additional arguments affecting the summary produced.

Details

ps.cont extends ps in twang to continuous treatments. It estimates weights from a series of trees and then outputs the weights that optimize a user-set criterion. The criterion employed involves the correlation between the treatment and each covariate. In a fully balanced sample, the treatment will have a correlation of 0 with covariates sufficient for removing confounding. Zhu, Coffman, & Ghosh (2015), who were the first to describe GBM for propensity score weighting with continuous treatments, recommend this procedure and provided R code to implement the methods they describe. ps.cont adapts their syntax to make it consistent with that of ps in twang. As in Zhu et al. (2015), when the Pearson correlation is requested, weighted biserial correlations will be computed for binary covariates.

The weights are estimated as the marginal density of the treatment divided by the conditional density of the treatment on the covariates for each unit. For the marginal density, a kernel density estimator can be implemented using the density function. For the conditional density, a Gaussian density is assumed. Note that with treatment with outlying values, extreme weights can be produced, so it is important to examine the weights and trim them if necessary.

It is recommended to use as many trees as possible, though this requires more computation time, especially with use.optimize set to 0. There is little difference between using Pearson and Spearman correlations or between using the raw correlations and the Z-transformed correlations. Typically the only gbm-related options that should be changed are the interaction depth and number of trees.

Missing data is not allowed in the covariates because of the ambiguity in computing correlations with missing values.

summary.ps.cont compresses the information in the desc component of the ps.cont object into a short summary table describing the size of the dataset and the quality of the generalized propensity score weights, in a similar way to summary.ps.

plot.ps.cont and boxplot.ps.cont function almost identically to plot.ps and boxplot.ps. See the help pages there for more information. Note that for plot.ps, only options 1, 2, and 6 are available for the plots argument. When use.optimize = 2, option 1 is not available.

Value

Returns an object of class ps and ps.cont, a list containing

`gbm.obj`	The returned `gbm` object.
`treat`	The treatment variable.
`desc`	a list containing balance tables for each method selected in `stop.method`. Includes a component for the unweighted analysis names “unw”. Each `desc` component includes a list with the following components: ess The effective sample size n The number of subjects max.p.cor The largest absolute Pearson correlation across the covariates mean.p.cor The mean absolute Pearson correlation of the covariates rmse.p.cor The root mean squared Pearson correlation across the covariates max.s.cor The largest absolute Spearman correlation across the covariates mean.s.cor The mean absolute Spearman correlation of the covariates rmse.s.cor The root mean squared Spearman correlation across the covariates bal.tab a table summarizing the quality of the weights for yielding low treatment-covariate correlations. This table is best extracted using `bal.table`. n.trees The estimated optimal number of `gbm` iterations to optimize the loss function for the associated `stop.method`s
`ps`	a data frame containing the estimated generalized propensity scores. Each column is associated with one of the methods selected in `stop.methods`.
`w`	a data frame containing the propensity score weights. Each column is associated with one of the methods selected in `stop.methods`. If sampling weights are given then these are incorporated into the weights.
`estimand`	`NULL`
`datestamp`	Records the date of the analysis.
`parameters`	Saves the `ps.cont` call.
`alerts`	`NULL`
`iters`	A sequence of iterations used in the GBM fits used by `plot.ps.cont`.
`balance`	The balance summary for each tree examined, with a column for each stop.method. If `optimize = 0`, this will contain balance summaries for all trees. If `optimize = 1`, this will contain balance summaries for the subset of trees corresponding to `iters`. If `optimize = 2`, this will be NULL.
`n.trees`	Maximum number of trees considered in GBM fit.
`data`	Data as specified in the `data` argument.

The NULL entries exist so the output object is similar to that of ps in twang.

Author(s)

Noah Greifer

ps.cont is heavily adapted from the R code in Zhu, Coffman, & Ghosh (2015). In contrast with their code, ps.cont uses weighted Pearson and Spearman correlations rather than probability weighted bootstrapped correlations, allows for different degrees of optimization in searching for the best solution, and allows for the use of kernel density estimation for the generalized propensity score. ps.cont also takes inspiration from ps in twang.

References

Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments. Journal of Causal Inference, 3(1). doi: 10.1515/jci-2014-0022

Examples

# Examples take a long time
## Not run: 
library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates with respect to re75
psc.out <- ps.cont(re75 ~ age + educ + married +
                nodegree + race + re74, data = lalonde,
                stop.method = c("p.mean", "p.max"),
                use.optimize = 2)
summary(psc.out)
twang::bal.table(psc.out) #twang's bal.table

## End(Not run)

WeightIt

Weighting for Covariate Balance in Observational Studies

v0.12.0

GPL (>= 2)

Authors

Noah Greifer [aut, cre] (<https://orcid.org/0000-0003-3067-7154>)

Initial release