synthpop: compare.synds – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

compare.synds

Compare univariate distributions of synthesised and observed data

Description

Compare synthesised data set with the original (observed) data set using percent frequency tables and histograms. When more than one synthetic data set has been generated (object$m>1), by default pooled synthetic data are used for comparison.

Usage

## S3 method for class 'synds'
compare(object, data, vars = NULL, msel = NULL, 
  breaks = 20, nrow = 2, ncol = 2, rel.size.x = 1, 
  cols = c("#1A3C5A","#4187BF"), stat = "percents", ...)

## S3 method for class 'compare.synds'
print(x, ...)

Arguments

`object`	an object of class `synds`, which stands for 'synthesised data set'. It is typically created by function `syn()` and it includes `object$m` synthesised data set(s).
`data`	an original (observed) data set.
`vars`	variables to be compared. If `vars` is `NULL` (the default) all synthesised variables are compared.
`msel`	index or indices of synthetic data copies for which a comparison is to be made. If `NULL` pooled synthetic data copies are compared with the original data.
`breaks`	the number of cells for the histogram.
`nrow`	the number of rows for the plotting area.
`ncol`	the number of columns for the plotting area.
`rel.size.x`	a number representing the relative size of x-axis labels.
`cols`	bar colors.
`stat`	determines whether tables and plots present percentages `stat = "percents"`, the default, or counts `stat = "counts"`. If `m > 1` and `msel = NULL` average counts for synthetic data are presented.
`...`	additional parameters.
`x`	an object of class `compare.synds`.

Details

Missing data categories for numeric variables are plotted on the same plot as non-missing values. They are indicated by miss. suffix.

Value

An object of class compare.synds which is a list including a list of comparative frequency tables (tables) and a ggplot object (plots) with bar charts/histograms. If multiple plots are produced they and their corresponding frequency tables are stored as a list.

References

Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. doi: 10.18637/jss.v074.i11.

Examples

ods <- SD2011[ , c("sex","age","edu","marital","ls","income")]
s1  <- syn(ods)
compare(s1, ods, vars = "ls")
compare(s1, ods, vars = "income", stat = "counts", breaks = 10)

synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

v1.6-0

GPL-2 | GPL-3

Authors

Beata Nowok [aut, cre], Gillian M Raab [aut], Chris Dibben [ctb], Joshua Snoke [ctb], Caspar van Lissa [ctb]

Initial release

2020-09-03