Methods for Storing and Analyzing Multiple Choice Variables
mChoice
is a function that is useful for grouping
variables that represent
individual choices on a multiple choice question. These choices are
typically factor or character values but may be of any type. Levels
of component factor variables need not be the same; all unique levels
(or unique character values) are collected over all of the multiple
variables. Then a new character vector is formed with integer choice
numbers separated by semicolons. Optimally, a database system would
have exported the semicolon-separated character strings with a
levels
attribute containing strings defining value labels
corresponding to the integer choice numbers. mChoice
is a
function for creating a multiple-choice variable after the fact.
mChoice
variables are explicitly handed by the describe
and summary.formula
functions. NA
s or blanks in input
variables are ignored.
format.mChoice
will convert the multiple choice representation
to text form by substituting levels
for integer codes.
as.double.mChoice
converts the mChoice
object to a
binary numeric matrix, one column per used level (or all levels of
drop=FALSE
. This is called by
the user by invoking as.numeric
. There is a
print
method and a summary
method, and a print
method for the summary.mChoice
object. The summary
method computes frequencies of all two-way choice combinations, the
frequencies of the top 5 combinations, information about which other
choices are present when each given choice is present, and the
frequency distribution of the number of choices per observation. This
summary
output is used in the describe
function.
in.mChoice
creates a logical vector the same length as x
whose elements are TRUE
when the observation in x
contains at least one of the codes or value labels in the second
argument.
match.mChoice
creats an integer vector of the indexes of all
elements in table
which contain any of the speicified levels
is.mChoice
returns TRUE
is the argument is a multiple
choice variable.
mChoice(..., label='', sort.levels=c('original','alphabetic'), add.none=FALSE, drop=TRUE) ## S3 method for class 'mChoice' format(x, minlength=NULL, sep=";", ...) ## S3 method for class 'mChoice' as.double(x, drop=FALSE, ...) ## S3 method for class 'mChoice' print(x, quote=FALSE, max.levels=NULL, width=getOption("width"), ...) ## S3 method for class 'mChoice' as.character(x, ...) ## S3 method for class 'mChoice' summary(object, ncombos=5, minlength=NULL, drop=TRUE, ...) ## S3 method for class 'summary.mChoice' print(x, prlabel=TRUE, ...) ## S3 method for class 'mChoice' x[..., drop=FALSE] match.mChoice(x, table, nomatch=NA, incomparables=FALSE) inmChoice(x, values) is.mChoice(x) ## S3 method for class 'mChoice' Summary(..., na.rm)
na.rm |
Logical: remove |
table |
a vector (mChoice) of values to be matched against. |
nomatch |
value to return if a value for |
incomparables |
logical whether incomparable values should be compaired. |
... |
a series of vectors |
label |
a character string |
sort.levels |
set |
add.none |
Set |
drop |
set |
x |
an object of class |
object |
an object of class |
ncombos |
maximum number of combos. |
width |
With of a line of text to be formated |
quote |
quote the output |
max.levels |
max levels to be displayed |
minlength |
By default no abbreviation of levels is done in
|
sep |
character to use to separate levels when formatting |
prlabel |
set to |
values |
a scalar or vector. If |
mChoice
returns a character vector of class "mChoice"
plus attributes "levels"
and "label"
.
summary.mChoice
returns an object of class
"summary.mChoice"
. inmChoice
returns a logical vector.
format.mChoice
returns a character vector, and
as.double.mChoice
returns a binary numeric matrix.
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
options(digits=3) set.seed(3) n <- 20 sex <- factor(sample(c("m","f"), n, rep=TRUE)) age <- rnorm(n, 50, 5) treatment <- factor(sample(c("Drug","Placebo"), n, rep=TRUE)) # Generate a 3-choice variable; each of 3 variables has 5 possible levels symp <- c('Headache','Stomach Ache','Hangnail', 'Muscle Ache','Depressed') symptom1 <- sample(symp, n, TRUE) symptom2 <- sample(symp, n, TRUE) symptom3 <- sample(symp, n, TRUE) cbind(symptom1, symptom2, symptom3)[1:5,] Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms') Symptoms print(Symptoms, long=TRUE) format(Symptoms[1:5]) inmChoice(Symptoms,'Headache') levels(Symptoms) inmChoice(Symptoms, 3) inmChoice(Symptoms, c('Headache','Hangnail')) # Note: In this example, some subjects have the same symptom checked # multiple times; in practice these redundant selections would be NAs # mChoice will ignore these redundant selections meanage <- N <- numeric(5) for(j in 1:5) { meanage[j] <- mean(age[inmChoice(Symptoms,j)]) N[j] <- sum(inmChoice(Symptoms,j)) } names(meanage) <- names(N) <- levels(Symptoms) meanage N # Manually compute mean age for 2 symptoms mean(age[symptom1=='Headache' | symptom2=='Headache' | symptom3=='Headache']) mean(age[symptom1=='Hangnail' | symptom2=='Hangnail' | symptom3=='Hangnail']) summary(Symptoms) #Frequency table sex*treatment, sex*Symptoms summary(sex ~ treatment + Symptoms, fun=table) # Check: ma <- inmChoice(Symptoms, 'Muscle Ache') table(sex[ma]) # could also do: # summary(sex ~ treatment + mChoice(symptom1,symptom2,symptom3), fun=table) #Compute mean age, separately by 3 variables summary(age ~ sex + treatment + Symptoms) summary(age ~ sex + treatment + Symptoms, method="cross") f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE) f # trio of numbers represent 25th, 50th, 75th percentile print(f, long=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.