survey: svyby – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

survey

svyby

Survey statistics on subsets

Description

Compute survey statistics on subsets of a survey defined by factors.

Usage

svyby(formula, by ,design,...)
## Default S3 method:
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, return.replicates=FALSE,
 na.rm.by=FALSE, na.rm.all=FALSE,
multicore=getOption("survey.multicore"))
## S3 method for class 'survey.design2'
svyby(formula, by, design, FUN, ..., deff=FALSE,keep.var = TRUE,
keep.names = TRUE,verbose=FALSE, vartype=c("se","ci","ci","cv","cvpct","var"),
 drop.empty.groups=TRUE, covmat=FALSE, influence=covmat,
 na.rm.by=FALSE, na.rm.all=FALSE, multicore=getOption("survey.multicore"))

## S3 method for class 'svyby'
SE(object,...)
## S3 method for class 'svyby'
deff(object,...)
## S3 method for class 'svyby'
coef(object,...)
## S3 method for class 'svyby'
confint(object,  parm, level = 0.95,df =Inf,...)
unwtd.count(x, design, ...)

Arguments

`formula,x`	A formula specifying the variables to pass to `FUN` (or a matrix, data frame, or vector)
`by`	A formula specifying factors that define subsets, or a list of factors.
`design`	A `svydesign` or `svrepdesign` object
`FUN`	A function taking a formula and survey design object as its first two arguments.
`...`	Other arguments to `FUN`
`deff`	Request a design effect from `FUN`
`keep.var`	If `FUN` returns a `svystat` object, extract standard errors from it
`keep.names`	Define row names based on the subsets
`verbose`	If `TRUE`, print a label for each subset as it is processed.
`vartype`	Report variability as one or more of standard error, confidence interval, coefficient of variation, percent coefficient of variation, or variance
`drop.empty.groups`	If `FALSE`, report `NA` for empty groups, if `TRUE` drop them from the output
`na.rm.by`	If true, omit groups defined by `NA` values of the `by` variables

`na.rm.all`	If true, check for groups with no non-missing observations for variables defined by `formula` and treat these groups as empty
`covmat`	If `TRUE`, compute covariances between estimates for different subsets. Allows `svycontrast` to be used on output. Requires that `FUN` supports either `return.replicates=TRUE` or `influence=TRUE`
`return.replicates`	Only for replicate-weight designs. If `TRUE`, return all the replicates as the "replicates" attribute of the result
`influence`	Return the influence functions of the result
`multicore`	Use `multicore` package to distribute subsets over multiple processors?
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	the confidence level required.
`df`	degrees of freedom for t-distribution in confidence interval, use `degf(design)` for number of PSUs minus number of strata
`object`	An object of class `"svyby"`

Details

The variance type "ci" asks for confidence intervals, which are produced by confint. In some cases additional options to FUN will be needed to produce confidence intervals, for example, svyquantile needs ci=TRUE or keep.var=FALSE.

unwtd.count is designed to be passed to svyby to report the number of non-missing observations in each subset. Observations with exactly zero weight will also be counted as missing, since that's how subsets are implemented for some designs.

Parallel processing with multicore=TRUE is useful only for fairly large problems and on computers with sufficient memory. The multicore package is incompatible with some GUIs, although the Mac Aqua GUI appears to be safe.

Value

An object of class "svyby": a data frame showing the factors and the results of FUN.

For unwtd.count, the unweighted number of non-missing observations in the data matrix specified by x for the design.

Note

The function works by making a lot of calls of the form FUN(formula, subset(design, by==i)), where formula is re-evaluated in each subset, so it is unwise to use data-dependent terms in formula. In particular, svyby(~factor(a), ~b, design=d, svymean), will create factor variables whose levels are only those values of a present in each subset. Either use update.survey.design to add variables to the design object instead or specify the levels explicitly in the call to factor.

Note

Asking for a design effect (deff=TRUE) from a function that does not produce one will cause an error or incorrect formatting of the output. The same will occur with keep.var=TRUE if the function does not compute a standard error.

Examples

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

svyby(~api99, ~stype, dclus1, svymean)
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5,ci=TRUE,vartype="ci")
## without ci=TRUE svyquantile does not compute standard errors
svyby(~api99, ~stype, dclus1, svyquantile, quantiles=0.5, keep.var=FALSE)
svyby(~api99, list(school.type=apiclus1$stype), dclus1, svymean)
svyby(~api99+api00, ~stype, dclus1, svymean, deff=TRUE,vartype="ci")
svyby(~api99+api00, ~stype+sch.wide, dclus1, svymean, keep.var=FALSE)
## report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, dclus1, unwtd.count, keep.var=FALSE)

rclus1<-as.svrepdesign(dclus1)

svyby(~api99, ~stype, rclus1, svymean)
svyby(~api99, ~stype, rclus1, svyquantile, quantiles=0.5)
svyby(~api99, list(school.type=apiclus1$stype), rclus1, svymean, vartype="cv")
svyby(~enroll,~stype, rclus1,svytotal, deff=TRUE)
svyby(~api99+api00, ~stype+sch.wide, rclus1, svymean, keep.var=FALSE)
##report raw number of observations
svyby(~api99+api00, ~stype+sch.wide, rclus1, unwtd.count, keep.var=FALSE)

## comparing subgroups using covmat=TRUE
mns<-svyby(~api99, ~stype, rclus1, svymean,covmat=TRUE)
vcov(mns)
svycontrast(mns, c(E = 1, M = -1))

str(svyby(~api99, ~stype, rclus1, svymean,return.replicates=TRUE))

tots<-svyby(~enroll, ~stype, dclus1, svytotal,covmat=TRUE)
vcov(tots)
svycontrast(tots, quote(E/H))



## extractor functions
(a<-svyby(~enroll, ~stype, rclus1, svytotal, deff=TRUE, verbose=TRUE, 
  vartype=c("se","cv","cvpct","var")))
deff(a)
SE(a)
cv(a)
coef(a)
confint(a, df=degf(rclus1))

## ratio estimates
svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio)

ratios<-svyby(~api.stu, by=~stype, denominator=~enroll, design=dclus1, svyratio,covmat=TRUE)
vcov(ratios)

## empty groups
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean)
svyby(~api00,~comp.imp+sch.wide,design=dclus1,svymean,drop.empty.groups=FALSE)

survey

Analysis of Complex Survey Samples

v4.0

GPL-2 | GPL-3

Authors

Thomas Lumley

Initial release

svyby

Description

Usage

Arguments

Details

Value

Note

Note

See Also

Examples

survey

We don't support your browser anymore