Rescale design weights for multilevel analysis
Most functions to fit multilevel and mixed effects models only
allow to specify frequency weights, but not design (i.e. sampling or
probability) weights, which should be used when analyzing complex samples
and survey data. rescale_weights()
implements an algorithm proposed
by Asparouhov (2006) and Carle (2009) to rescale design
weights in survey data to account for the grouping structure of multilevel
models, which then can be used for multilevel modelling.
rescale_weights(data, group, probability_weights, nest = FALSE)
data |
A data frame. |
group |
Variable names (as character vector, or as formula), indicating the grouping structure (strata) of the survey data (level-2-cluster variable). It is also possible to create weights for multiple group variables; in such cases, each created weighting variable will be suffixed by the name of the group variable. |
probability_weights |
Variable indicating the probability (design or sampling) weights of the survey data (level-1-weight). |
nest |
Logical, if |
Rescaling is based on two methods: For pweights_a
, the sample
weights probability_weights
are adjusted by a factor that represents
the proportion of group size divided by the sum of sampling weights within
each group. The adjustment factor for pweights_b
is the sum of
sample weights within each group divided by the sum of squared sample
weights within each group (see Carle (2009), Appendix B).
Regarding the choice between scaling methods A and B, Carle suggests
that "analysts who wish to discuss point estimates should report results
based on weighting method A. For analysts more interested in residual
between-group variance, method B may generally provide the least biased
estimates". In general, it is recommended to fit a non-weighted model
and weighted models with both scaling methods and when comparing the
models, see whether the "inferential decisions converge", to gain
confidence in the results.
Though the bias of scaled weights decreases with increasing group size,
method A is preferred when insufficient or low group size is a concern.
The group ID and probably PSU may be used as random effects (e.g.
nested design, or group and PSU as varying intercepts), depending
on the survey design that should be mimicked.
data
, including the new weighting variables: pweights_a
and pweights_b
, which represent the rescaled design weights to use
in multilevel models (use these variables for the weights
argument).
Carle A.C. (2009). Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9(49): 1-13
Asparouhov T. (2006). General Multi-Level Modeling with Sampling Weights. Communications in Statistics - Theory and Methods 35: 439-460
if (require("sjstats")) { data(nhanes_sample, package = "sjstats") head(rescale_weights(nhanes_sample, "SDMVSTRA", "WTINT2YR")) # also works with multiple group-variables... head(rescale_weights(nhanes_sample, c("SDMVSTRA", "SDMVPSU"), "WTINT2YR")) # or nested structures. x <- rescale_weights( data = nhanes_sample, group = c("SDMVSTRA", "SDMVPSU"), probability_weights = "WTINT2YR", nest = TRUE ) head(x) } if (require("lme4") && require("sjstats")) { data(nhanes_sample, package = "sjstats") nhanes_sample <- rescale_weights(nhanes_sample, "SDMVSTRA", "WTINT2YR") glmer( total ~ factor(RIAGENDR) * (log(age) + factor(RIDRETH1)) + (1 | SDMVPSU), family = poisson(), data = nhanes_sample, weights = pweights_a ) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.