Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

expandCategorical

Expand Data Frame by Re-expressing Categorical Data as Counts


Description

Expands the rows of a data frame by re-expressing observations of a categorical variable specified by catvar, such that the column(s) corresponding to catvar are replaced by a factor specifying the possible categories for each observation and a vector of 0/1 counts over these categories.

Usage

expandCategorical(data, catvar, sep = ".", countvar = "count",
                  idvar = "id", as.ordered = FALSE, group = TRUE)

Arguments

data

a data frame.

catvar

a character vector specifying factors in data whose interaction will form the basis of the expansion.

sep

a character string used to separate the concatenated values of catvar in the name of the new interaction factor.

countvar

(optional) a character string to be used for the name of the new count variable.

idvar

(optional) a character string to be used for the name of the new factor identifying the original rows (cases).

as.ordered

logical - whether the new interaction factor should be of class "ordered".

group

logical: whether or not to group individuals with common values over all covariates.

Details

Each row of the data frame is replicated c times, where c is the number of levels of the interaction of the factors specified by catvar. In the expanded data frame, the columns specified by catvar are replaced by a factor specifying the r possible categories for each case, named by the concatenated values of catvar separated by sep. The ordering of factor levels will be preserved in the creation of the new factor, but this factor will not be of class "ordered" unless the argument as.ordered = TRUE. A variable with name countvar is added to the data frame which is equal to 1 for the observed category in each case and 0 elsewhere. Finally a factor with name idvar is added to index the cases.

Value

The expanded data frame as described in Details.

Note

Re-expressing categorical data in this way allows a multinomial response to be modelled as a poisson response, see examples.

Author(s)

Heather Turner

References

Anderson, J. A. (1984) Regression and Ordered Categorical Variables. J. R. Statist. Soc. B, 46(1), 1-30.

See Also

Examples

### Example from help(multinom, package = "nnet")
library(MASS)
example(birthwt)
library(nnet)
bwt.mu <- multinom(low ~ ., data = bwt)

## Equivalent using gnm - include unestimable main effects in model so 
## that interactions with low0 automatically set to zero, else could use
## 'constrain' argument. 
bwtLong <- expandCategorical(bwt, "low", group = FALSE)
bwt.po <- gnm(count ~  low*(. - id), eliminate = id, data = bwtLong, family =
              "poisson") 
summary(bwt.po) # same deviance; df reflect extra id parameters

### Example from ?backPain
set.seed(1)
summary(backPain)
backPainLong <- expandCategorical(backPain, "pain")

## Fit models described in Table 5 of Anderson (1984)

noRelationship <- gnm(count ~ pain, eliminate = id,
                      family = "poisson", data = backPainLong)

oneDimensional <- update(noRelationship,
                         ~ . + Mult(pain, x1 + x2 + x3))

gnm

Generalized Nonlinear Models

v1.1-1
GPL-2 | GPL-3
Authors
Heather Turner [aut, cre] (<https://orcid.org/0000-0002-1256-3375>), David Firth [aut] (<https://orcid.org/0000-0003-0302-2312>), Brian Ripley [ctb], Bill Venables [ctb], Douglas M. Bates [ctb], Martin Maechler [ctb] (<https://orcid.org/0000-0002-8685-9910>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.