Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

numtocat.syn

Group numeric variables before synthesis


Description

Make a new data frame with selected numeric variables grouped into factors with ranges selected from the data.

Usage

numtocat.syn(data, numtocat = NULL, print.flag = TRUE, cont.na = NULL, 
             catgroups = 5, style.groups = "fisher")

Arguments

data

a data frame.

numtocat

a vector of numbers or variable names of numeric variables to be grouped into factors. If NULL all the numeric variables in data will be grouped.

print.flag

if TRUE a list of grouped variables is printed

cont.na

a named list that gives the values of the named variables to be treated as separate categories, often missing values like -8. See the corresponding parameter of syn().

catgroups

a single integer or a vector of integers indicating the target number of groups for the variables in numtocat in the same order as numtocat, or as their relative postions in data. The achieved number of groups may be different if, for example there are fewer than ngroups distinct values.

style.groups

parameter of the function classInt() that determines how the breaks used to categorise each variable are chosen. See the help file for classInt() for details

Value

a list with the following components:

data

a data frame with the numeric variables replaced by factors grouped into ranges.

breaks

a named list of the breaks used to divide each numeric variable into categories.

levels

a named list of the levels for the categories of each numeric variable.

orig

a data frame with the original numeric data.

cont.na

a named list of the levels for the categorical version of each numeric variable.

numtocat

names of the variables changed to categories.

ind

positions in data of the variables changed to categories.

Examples

SD2011.cat <- numtocat.syn(SD2011, cont.na = list(income = -8 , unempdur = -8, 
nofriend = -8))
summary(SD2011.cat$data)

synthpop

Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

v1.6-0
GPL-2 | GPL-3
Authors
Beata Nowok [aut, cre], Gillian M Raab [aut], Chris Dibben [ctb], Joshua Snoke [ctb], Caspar van Lissa [ctb]
Initial release
2020-09-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.