survey: svyglm – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

svyglm

Survey-weighted generalised linear models.

Description

Fit a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.

Usage

## S3 method for class 'survey.design'
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE,
 influence=FALSE)
## S3 method for class 'svyrep.design'
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL,
return.replicates=FALSE, na.action,multicore=getOption("survey.multicore"))
## S3 method for class 'svyglm'
summary(object, correlation = FALSE, df.resid=NULL,
...)
## S3 method for class 'svyglm'
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,...)
## S3 method for class 'svrepglm'
predict(object,newdata=NULL,total=NULL,
                         type=c("link","response","terms"),
                         se.fit=(type != "terms"),vcov=FALSE,
                         return.replicates=!is.null(object$replicates),...)

Arguments

`formula`	Model formula
`design`	Survey design from `svydesign` or `svrepdesign`. Must contain all variables in the formula
`subset`	Expression to select a subpopulation
`family`	`family` object for `glm`
`start`	Starting values for the coefficients (needed for some uncommon link/family combinations)
`rescale`	Rescaling of weights, to improve numerical stability. The default rescales weights to sum to the sample size. Use `FALSE` to not rescale weights. For replicate-weight designs, use `TRUE` to rescale weights to sum to 1, as was the case before version 3.34.
`...`	Other arguments passed to `glm` or `summary.glm`
`rho`	For replicate BRR designs, to specify the parameter for Fay's variance method, giving weights of `rho` and `2-rho`
`return.replicates`	Return the replicates as the `replicates` component of the result? (for `predict`, only possible if they were computed in the `svyglm` fit)
`deff`	Estimate the design effects
`influence`	Return influence functions
`object`	A `svyglm` object
`correlation`	Include the correlation matrix of parameters?
`na.action`	Handling of NAs
`multicore`	Use the `multicore` package to distribute replicates across processors?
`df.resid`	Optional denominator degrees of freedom for Wald tests
`newdata`	new data frame for prediction
`total`	population size when predicting population total
`type`	linear predictor (`link`) or response
`se.fit`	if `TRUE`, return variances of predictions
`vcov`	if `TRUE` and `se=TRUE` return full variance-covariance matrix of predictions

Details

For binomial and Poisson families use family=quasibinomial() and family=quasipoisson() to avoid a warning about non-integer numbers of successes. The ‘quasi’ versions of the family objects give the same point estimates and standard errors and do not give the warning.

If df.resid is not specified the df for the null model is computed by degf and the residual df computed by subtraction. This is recommended by Korn and Graubard and is correct for PSU-level covariates but is potentially very conservative for individual-level covariates. To get tests based on a Normal distribution use df.resid=Inf, and to use number of PSUs-number of strata, specify df.resid=degf(design).

Parallel processing with multicore=TRUE is helpful only for fairly large data sets and on computers with sufficient memory. It may be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.

predict gives fitted values and sampling variability for specific new values of covariates. When newdata are the population mean it gives the regression estimator of the mean, and when newdata are the population totals and total is specified it gives the regression estimator of the population total. Regression estimators of mean and total can also be obtained with calibrate.

Value

svyglm returns an object of class svyglm. The predict method returns an object of class svystat

Note

svyglm always returns 'model-robust' standard errors; the Horvitz-Thompson-type standard errors used everywhere in the survey package are a generalisation of the model-robust 'sandwich' estimators. In particular, a quasi-Poisson svyglm will return correct standard errors for relative risk regression models.

Note

This function does not return the same standard error estimates for the regression estimator of population mean and total as some textbooks, or SAS. However, it does give the same standard error estimator as estimating the mean or total with calibrated weights.

In particular, under simple random sampling with or without replacement there is a simple rescaling of the mean squared residual to estimate the mean squared error of the regression estimator. The standard error estimate produced by predict.svyglm has very similar (asymptotically identical) expected value to the textbook estimate, and has the advantage of being applicable when the supplied newdata are not the population mean of the predictors. The difference is small when the sample size is large, but can be appreciable for small samples.

You can obtain the other standard error estimator by calling predict.svyglm with the covariates set to their estimated (rather than true) population mean values.

Author(s)

Thomas Lumley

References

Lumley T, Scott A (2017) "Fitting Regression Models to Survey Data" Statistical Science 32: 265-278

Examples

data(api)


  dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
  dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
  rstrat<-as.svrepdesign(dstrat)
  rclus2<-as.svrepdesign(dclus2)

  summary(svyglm(api00~ell+meals+mobility, design=dstrat))
  summary(svyglm(api00~ell+meals+mobility, design=dclus2))
  summary(svyglm(api00~ell+meals+mobility, design=rstrat))
  summary(svyglm(api00~ell+meals+mobility, design=rclus2))

  ## use quasibinomial, quasipoisson to avoid warning messages
  summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat,
        family=quasibinomial()))


  ## Compare regression and ratio estimation of totals
  api.ratio <- svyratio(~api.stu,~enroll, design=dstrat)
  pop<-data.frame(enroll=sum(apipop$enroll, na.rm=TRUE))
  npop <- nrow(apipop)
  predict(api.ratio, pop$enroll)

  ## regression estimator is less efficient
  api.reg <- svyglm(api.stu~enroll, design=dstrat)
  predict(api.reg, newdata=pop, total=npop)
  ## same as calibration estimator
  svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll)))

  ## svyglm can also reproduce the ratio estimator
  api.reg2 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu"))
  predict(api.reg2, newdata=pop, total=npop)

  ## higher efficiency by modelling variance better
  api.reg3 <- svyglm(api.stu~enroll-1, design=dstrat,
                    family=quasi(link="identity",var="mu^3"))
  predict(api.reg3, newdata=pop, total=npop)
  ## true value
  sum(apipop$api.stu)

survey

Analysis of Complex Survey Samples

v4.0

GPL-2 | GPL-3

Authors

Thomas Lumley

Initial release

svyglm

Description

Usage

Arguments

Details

Value

Note

Note

Author(s)

References

See Also

Examples

survey

We don't support your browser anymore