synthpop: summary.fit.synds – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

synthpop

summary.fit.synds

Inference from synthetic data

Description

Combines the results of models fitted to each of the m synthetic data sets.

Usage

## S3 method for class 'fit.synds'
summary(object, population.inference = FALSE, msel = NULL,
  real.varcov = NULL, ...)

## S3 method for class 'summary.fit.synds'
print(x, ...)

Arguments

`object`	an object of class `fit.synds` created by fitting a model to synthesised data set using function `glm.synds`, `lm.synds`,`multinom.synds` or `polr.synds`.
`population.inference`	a logical value indicating whether inference should be made to population quantities. If `FALSE` inference is made to the results that would be expected from an analysis of the original data. This option should be selected if the synthetic data are being used for exploratory analysis, but the final published results will be obtained by running code on the original confidential data. If `population.inference = TRUE` results would allow population inference to be made from the synthetic data. In both cases the inference will depend on the synthesising model being correct, but this can be checked by running the same analysis on the real data, see `compare.fit.synds`.
`msel`	index or indices of the synthetic datasets (`1`, `...`, `m`), for which summaries of fitted models are to be produced. If `NULL` (default) only the summary of combined estimates is produced.
`real.varcov`	the estimated variance-covariance matrix of the fit of the model to the original data. This parameter is used in the function `compare.fit.synds` which has the original data as one of its parameters.
`...`	additional parameters.
`x`	an object of class `summary.fit.synds`.

Details

The mean of the estimates from each of the m synthetic data sets yields asymptotically unbiased estimates of the coefficients if the observed data conform to the distribution used for synthesis. The standard errors are estimated differently depending whether inference is made for the results that we would expect to obtain from the observed data or for the parameters of the population that we assume the observed data are sampled from. The standard errors also differ according to whether synthetic data were produced using simple or proper synthesis (for details see Raab et al. (2017)).

Value

An object of class summary.fit.synds which is a list with the following components:

`call`	the original call to `glm.synds` or `lm.synds`.
`proper`	a logical value indicating whether synthetic data were generated using proper synthesis.
`population.inference`	a logical value indicating whether inference is made to population coefficients or to the results that would be expected from an analysis of the original data (see above).
`incomplete`	a logical value indicating whether any of the variables in the model were not synthesised. It is derived in the synthpop implementation of the fitting functions (`lm.synds`, `glm.synds`, `multinom.synds` and `polr.synds`) and saved with the fitted object. When `TRUE` inference with `population.inference = TRUE` uses the method proposed by Reiter (2003) for what he terms partially synthetic data. This method requires multiple syntheses (`m > 1`). If `m = 1`, `incomplete = TRUE` and `population.inference = TRUE` the results will be still calculated and returned with warning. This will usually give standard errors that are larger than they should be.
`fitting.function`	function used to fit the model.
`m`	the number of synthetic versions of the original (observed) data.
`coefficients`	a matrix with combined estimates. If inference is required to the results that would be obtained from an analysis of the original data, (`population.inference = FALSE`) the coefficients are given by `xpct(Beta)`, the standard errors by `xpct(se.Beta)` and the corresponding Z-statistic by `xpct(Z)`. If the synthetic data are to be used to make inferences to population quantities (`population.inference = TRUE`), the coefficients are given by `Beta.syn`, their standard errors by `se.Beta.syn` and the Z-statistic by `Z.syn` (see vignette on inference for more details).
`n`	a number of cases in the original data.
`k`	the number of cases in the synthesised data. Note that if `k` and `n` are not equal and `population.inference = FALSE` (the default), then the standard errors produced will estimate what would be expected by an analysis of the original data set of size `n`.
`analyses`	`summary.glm` or `summary.lm` object respectively or a list of `m` such objects.
`msel`	index or indices of synthetic data copies for which summaries of fitted models are produced. If `NULL` only a summary of combined estimates is produced.

References

Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. doi: 10.18637/jss.v074.i11.

Raab, G.M., Nowok, B. and Dibben, C. (2017). Practical data synthesis for large samples. Journal of Privacy and Confidentiality, 7(3), 67-97. Available at: https://journalprivacyconfidentiality.org/index.php/jpc/article/view/407

Reiter, J.P. (2003) Inference for partially synthetic, public use microdata sets. Survey Methodology, 29, 181-188.

Examples

ods <- SD2011[1:1000,c("sex","age","edu","ls","smoke")]
  
### simple synthesis
s1 <- syn(ods, m = 5)
f1 <- glm.synds(smoke ~ sex + age + edu + ls, data = s1, family = "binomial")
summary(f1)
summary(f1, population.inference = TRUE)
  
### proper synthesis
s2 <- syn(ods, m = 5, method = "parametric", proper = TRUE)
f2 <- glm.synds(smoke ~ sex + age + edu + ls, data = s2, family = "binomial")
summary(f2)
summary(f2, population.inference = TRUE)

synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

v1.6-0

GPL-2 | GPL-3

Authors

Beata Nowok [aut, cre], Gillian M Raab [aut], Chris Dibben [ctb], Joshua Snoke [ctb], Caspar van Lissa [ctb]

Initial release

2020-09-03

summary.fit.synds

Description

Usage

Arguments

Details

Value

References

See Also

Examples

synthpop

We don't support your browser anymore