Synthesis with classification and regression trees (CART)
Generates univariate synthetic data using classification and regression trees (without or with bootstrap).
syn.ctree(y, x, xp, smoothing = "", proper = FALSE, minbucket = 5, mincriterion = 0.9, ...) syn.cart(y, x, xp, smoothing = "", proper = FALSE, minbucket = 5, cp = 1e-08, ...)
y |
an original data vector of length |
x |
a matrix ( |
xp |
a matrix ( |
smoothing |
smoothing method for continuous variables. |
proper |
for proper synthesis ( |
minbucket |
the minimum number of observations in
any terminal node. See |
cp |
complexity parameter. Any split that does not
decrease the overall lack of fit by a factor of cp is not
attempted. Small values of |
mincriterion |
|
... |
additional parameters passed to
|
The procedure for synthesis by a CART model is as follows:
Fit a classification or regression tree by binary recursive partitioning.
For each xp
find the terminal node.
Randomly
draw a donor from the members of the node and take the observed
value of y
from that draw as the synthetic value.
A Guassian kernel smoothing can be applied to continuous variables
by setting smoothing parameter to "density"
. It is recommended
as a tool to decrease the disclosure risk. Increasing minbucket
is another means of data protection.
CART models were suggested for generation of synthetic data by Reiter (2005) and then evaluated by Drechsler and Reiter (2011).
A list with two components:
res |
a vector of length |
fit |
the fitted model which is an object of class |
Reiter, J.P. (2005). Using CART to generate partially synthetic, public use microdata. Journal of Official Statistics, 21(3), 441–462.
Drechsler, J. and Reiter, J.P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics and Data Analysis, 55(12), 3232–3243.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.