Compute Balance Statistics for Covariates
These functions quickly compute balance statistics for the given covariates. These functions are used in bal.tab()
, but they are available for use in programming without having to call bal.tab()
to get them.
col_w_mean
computes the (weighted) means for a set of covariates and weights and is essentially a weighted version of colMeans
.
col_w_sd
computes the (weighted) standard deviations for a set of covariates and weights.
col_w_smd
computes the (weighted) (absolute) (standardized) difference in means for a set of covariates, a binary treatment, and weights.
col_w_vr
computes the (weighted) variance ratio for a set of covariates, a binary treatment, and weights.
col_w_ks
computes the (weighted) Kolmogorov-Smirnov (KS) statistic for a set of covariates, a binary treatment, and weights.
col_w_ovl
computes the complement of the (weighted) overlapping coefficient for a set of covariates, a binary treatment, and weights (based on Franklin et al, 2014).
col_w_cov
and col_w_corr
compute the (weighted) (absolute) treatment-covariate covariance or correlation for a set of covariates, a continuous treatment, and weights.
col_w_mean(mat, weights = NULL, s.weights = NULL, subset = NULL, na.rm = TRUE, ...) col_w_sd(mat, weights = NULL, s.weights = NULL, bin.vars, subset = NULL, na.rm = TRUE, ...) col_w_smd(mat, treat, weights = NULL, std = TRUE, s.d.denom = "pooled", abs = FALSE, s.weights = NULL, bin.vars, subset = NULL, weighted.weights = weights, na.rm = TRUE, ...) col_w_vr(mat, treat, weights = NULL, abs = FALSE, s.weights = NULL, bin.vars, subset = NULL, na.rm = TRUE, ...) col_w_ks(mat, treat, weights = NULL, s.weights = NULL, bin.vars, subset = NULL, na.rm = TRUE, ...) col_w_ovl(mat, treat, weights = NULL, s.weights = NULL, bin.vars, integrate = FALSE, subset = NULL, na.rm = TRUE, ...) col_w_cov(mat, treat, weights = NULL, type = "pearson", std = FALSE, s.d.denom = "all", abs = FALSE, s.weights = NULL, bin.vars, subset = NULL, weighted.weights = weights, na.rm = TRUE, ...) col_w_corr(mat, treat, weights = NULL, type = "pearson", s.d.denom = "all", abs = FALSE, s.weights = NULL, bin.vars, subset = NULL, weighted.weights = weights, na.rm = TRUE, ...)
mat |
a numeric matrix or a data frame containing the covariates for which the statistic is to be computed. If a data frame, |
weights |
|
s.weights |
|
subset |
a |
na.rm |
|
treat |
a vector of treatment status for each individual. For |
std |
|
s.d.denom |
for
For |
abs |
|
bin.vars |
a vector used to denote whether each variable is binary or not. Can be a |
weighted.weights |
for |
type |
for |
integrate |
|
... |
for all functions, additional arguments supplied to |
col_w_mean
computes column weighted means for a matrix of variables. It is similar to colMeans
but (optionally) incorporates weights. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. This could be used to compute the weighted means of each covariate in the general population to examine the degree to which a weighting method has left the weighted samples resembling the original population.
col_w_sd
computes column weighted standard deviations for a matrix of variables. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. The variance of binary variables is computed as p(1-p), where p is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula; the standard deviation is the square root of this variance.
col_w_smd
computes the mean difference for each covariate between treatment groups defined by treat
. These mean differences can optionally be weighted, standardized, and/or in absolute value. The standardization factor is computed using the unweighted standard deviation or variance when s.weights
are absent, and is computed using the s.weights
-weighted standard deviation or variance when s.weights
are present, except when s.d.denom = "weighted"
, in which case the product of weighted.weights
and s.weights
(if present) are used to weight the standardization factor. The standardization factor is computed using the whole sample even when subset
is used. Note that unlike bal.tab()
, col_w_smd
requires the user to specify whether each individual variable should be standardized using std
rather than relying on continuous
or binary
. The weighted mean difference is computed using the product of weights
and s.weights
, if specified. The variance of binary variables is computed as p(1-p), where p is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula.
col_w_vr
computes the variance ratio for each covariate between treatment groups defined by treat
. When abs = TRUE
, pmax(out, 1/out)
is applied to the output so that the ratio is always greater than or equal to 1. For binary variables, the variance is computed as p(1-p), where p is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula. Note that in bal.tab()
, variance ratios are not computed for binary variables, while here, they are (but likely should not be interpreted). weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. Because of how the weighted variance is computed, exactly balanced groups may have variance ratios that differ slightly from 1.
col_w_ks
computes the KS statistic for each covariate using the method implemented in twang. The KS statistics can optionally be weighted. For binary variables, the KS statistic is just the difference in proportions. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them.
col_w_ovl
computes the complement of the overlapping coefficient as described by Franklin et al. (2014). It does so by computing the density of the covariate in the treated and control groups, then finding the area where those density overlap, and subtracting that number from 1, yielding a value between 0 and 1 where 1 indicates complete imbalance, and 0 indicates perfect balance. density
is used to model the density in each group. The bandwidth of the covariate in the smaller treatment group is used for both groups. The area of overlap can be computed using integrate
, which quickly and accurately computes the integral, or using a midpoint Riemann sum with 1000 partitions, which approximates the area more slowly. A reason to prefer the Riemann sum is that integrate
can fail for unknown reasons, though Riemann sums will fail with some extreme distributions. When either method fails, the resulting value will be NA
. For binary variables, the complement of the overlapping coefficient is just the difference in proportions. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. The weights are used to compute the weighted density by supplying them to the weights
argument of density
.
col_w_cov
computes the covariance between a continuous treatment and the covariates to assess balance for continuous treatments as recommended in Austin (2019). These covariance can optionally be weighted or in absolute value or can be requested as correlations (i.e., standardized covariances). The correlations are computed as the covariance between the treatment and covariate divided by a standardization factor, which is equal to the square root of the product of the variance of treatment and the variance of the covariate. The standardization factor is computed using the unweighted variances when s.weights
are absent, and is computed using the sampling weighted variances when s.weights
are present, except when s.d.denom = "weighted"
, in which case the product of weighted.weights
and s.weights
(if present) are used to weight the standardization factor. For this reason, the computed correlation can be greater than 1 or less than -1. The standardization factor is always computed using the whole sample even when subset
is used. The covariance is computed using the product of weights
and s.weights
, if specified. The variance of binary variables is computed as p(1-p), where p is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula.
col_w_corr
is a wrapper for col_w_cov
with std
set to TRUE
.
A vector of balance statistics, one for each variable in mat
. If mat
has column names, the output will be named as well.
Franklin, J. M., Rassen, J. A., Ackermann, D., Bartels, D. B., & Schneeweiss, S. (2014). Metrics for covariate balance in cohort studies of causal effects. Statistics in Medicine, 33(10), 1685–1699. doi: 10.1002/sim.6058
Austin, P. C. (2019). Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures. Statistical Methods in Medical Research, 28(5), 1365–1377. doi: 10.1177/0962280218756159
What Works Clearinghouse. (2020). WWC Procedures Handbook (Version 4.1). Retrieved from https://ies.ed.gov/ncee/wwc/Handbooks
library(WeightIt); data("lalonde", package = "cobalt") treat <- lalonde$treat covs <- subset(lalonde, select = -c(treat, re78)) covs <- splitfactor(covs, drop.first = "if2") bin.vars <- c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE) W <- weightit(treat ~ covs, method = "ps", estimand = "ATE") weights <- W$weights round(data.frame( m0 = col_w_mean(covs, weights = weights, subset = treat == 0), sd0 = col_w_sd(covs, weights = weights, bin.vars = bin.vars, subset = treat == 0), m1 = col_w_mean(covs, weights = weights, subset = treat == 1), sd1 = col_w_sd(covs, weights = weights, bin.vars = bin.vars, subset = treat == 1), smd = col_w_smd(covs, treat = treat, weights = weights, std = TRUE, bin.vars = bin.vars), vr = col_w_vr(covs, treat = treat, weights = weights, bin.vars = bin.vars), ks = col_w_ks(covs, treat = treat, weights = weights, bin.vars = bin.vars), row.names = colnames(covs) ), 4) # Compare to bal.tab(): bal.tab(covs, treat, weights = weights, disp = c("m", "sd"), stats = c("m", "v", "ks"), estimand = "ATE", method = "weighting", binary = "std")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.