Expand Data Frame by Re-expressing Categorical Data as Counts
Expands the rows of a data frame by re-expressing observations of a
categorical variable specified by catvar
, such that the
column(s) corresponding to catvar
are replaced by a factor
specifying the possible categories for each observation and a vector
of 0/1 counts over these categories.
expandCategorical(data, catvar, sep = ".", countvar = "count", idvar = "id", as.ordered = FALSE, group = TRUE)
data |
a data frame. |
catvar |
a character vector specifying factors in |
sep |
a character string used to separate the concatenated
values of |
countvar |
(optional) a character string to be used for the name of the new count variable. |
idvar |
(optional) a character string to be used for the name of the new factor identifying the original rows (cases). |
as.ordered |
logical - whether the new interaction factor should
be of class |
group |
logical: whether or not to group individuals with common values over all covariates. |
Each row of the data frame is replicated c times, where c
is the number of levels of the interaction of the factors specified by
catvar
. In the expanded data frame, the columns specified by
catvar
are replaced by a factor specifying the r possible
categories for each case, named by the concatenated values of
catvar
separated by sep
. The ordering of factor levels
will be preserved in the creation of the new factor, but this factor
will not be of class "ordered"
unless the argument
as.ordered = TRUE
. A variable with name countvar
is added
to the data frame which is equal to 1 for the observed category in each
case and 0 elsewhere. Finally a factor with name idvar
is added
to index the cases.
The expanded data frame as described in Details.
Re-expressing categorical data in this way allows a multinomial response to be modelled as a poisson response, see examples.
Heather Turner
Anderson, J. A. (1984) Regression and Ordered Categorical Variables. J. R. Statist. Soc. B, 46(1), 1-30.
### Example from help(multinom, package = "nnet") library(MASS) example(birthwt) library(nnet) bwt.mu <- multinom(low ~ ., data = bwt) ## Equivalent using gnm - include unestimable main effects in model so ## that interactions with low0 automatically set to zero, else could use ## 'constrain' argument. bwtLong <- expandCategorical(bwt, "low", group = FALSE) bwt.po <- gnm(count ~ low*(. - id), eliminate = id, data = bwtLong, family = "poisson") summary(bwt.po) # same deviance; df reflect extra id parameters ### Example from ?backPain set.seed(1) summary(backPain) backPainLong <- expandCategorical(backPain, "pain") ## Fit models described in Table 5 of Anderson (1984) noRelationship <- gnm(count ~ pain, eliminate = id, family = "poisson", data = backPainLong) oneDimensional <- update(noRelationship, ~ . + Mult(pain, x1 + x2 + x3))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.