Tabular utility
Produce tables from observed and synthesized data and calculates utility measures to compare them with their expectation if the synthesising model is correct.
utility.tab(object, data, vars = NULL, ngroups = 5, useNA = TRUE, print.tables = length(vars) < 4, print.stats = 'VW', print.zdiff = FALSE, digits = 2, ...) ## S3 method for class 'utility.tab' print(x, print.tables = x$print.tables, print.zdiff = x$print.zdiff, print.stats = x$print.stats, digits = x$digits, ...)
object |
an object of class |
data |
the original (observed) data set. |
vars |
a single string or a vector of strings with the names of variables to be used to form the table. |
ngroups |
if numerical (non-factor) variables are included they will be
classified into this number of groups to form tables. Classification is
performed using |
useNA |
determines if NA values are to be included in tables. |
print.tables |
a logical value that determines if tables of observed and synthesised are to be printed. |
print.stats |
Determines which chi-squred statistics to print to compare the observed and synthetic tables : 'VW' for Voas Williams, 'FT' for Freeman Tukey or c('VW','FT') for both. |
print.zdiff |
a logical value that determines if tables of Z scores for differences between observed and expected are to be printed. |
digits |
an integer indicating the number of decimal places
for printing statistics, |
... |
additional parameters; can be passed to classIntervals() function. |
x |
an object of class |
Forms tables of observed and synthesised values for the variables
specified in vars
. Two utility measures are calculated from the cells
of the tables, a measure of fit proposed by Voas and Williams
sum((observed-synthesied)^2/[(observed + synthesised)/2)])
and one
proposed by Freeman and Tukey 4*sum((observed^(0.5)-synthesised^(0.5))^2))
.
In both cases those cells where observed and synthesised are both zero do not
contribute to the sum. If the synthesising model is correct both of these
measures should have chi-square distributions for large samples.
An object of class utility.tab
which is a list with the following
components:
m |
number of synthetic data sets in object, i.e. |
tab.obs |
a table from the observed data. |
UtabFT |
a vector with |
UtabVW |
a vector with |
df |
a vector of degrees of freedom for the chi-square tests which equal to one minus the number of cells in the table with any observed or synthesised counts. |
ratioFT |
a vector with ratios of |
ratioVW |
a vector with ratios of |
pvalFT |
a vector with |
pvalVW |
a vector with |
nempty |
a vector of length |
tab.obs |
a table from the observed data. |
tab.syn |
a table or a list of |
tab.zdiff |
a table or a list of |
n |
number of observation in the original dataset. |
Nowok, B., Raab, G.M and Dibben, C. (2016). synthpop: Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 1-26. doi: 10.18637/jss.v074.i11.
Read, T.R.C. and Cressie, N.A.C. (1988) Goodness–of–Fit Statistics for Discrete Multivariate Data, Springer–Verlag, New York.
Voas, D. and Williamson, P. (2001) Evaluating goodness-of-fit measures for synthetic microdata. Geographical and Environmental Modelling, 5(2), 177-200.
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital")] s1 <- syn(ods, m = 10) utility.tab(s1, ods, vars = c("marital", "sex")) s2 <- syn(ods, m = 1) utility.tab(s2, ods, vars = c("marital", "age"), ngroups = 3, print.tables = TRUE) u2 <- utility.tab(s2, ods, vars = c("marital", "age"), style = "pretty") print(u2, print.tables = TRUE, print.zdiff = TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.