Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

expand.bpairs

Expand binomial-pair data from short to long form


Description

Expand binomial-pair data from “short” to “long” form.

The short form specifies the response with two columns giving the numbers of successes and failures. Example short form:

survived died dose    sex
           3    0   10   male
           2    1   10 female
           1    2   20   male
           1    2   20 female

The long form specifies the response as single column of TRUEs and FALSEs. For example, the long form of the above data (spaces and comments added):

survived dose    sex
        TRUE   10   male     # row 1 of short data: 0 died, 3 survived
        TRUE   10   male
        TRUE   10   male

       FALSE   10 female     # row 2 of short data: 1 died, 2 survived
        TRUE   10 female
        TRUE   10 female

       FALSE   20   male     # row 3 of short data: 2 died, 1 survived
       FALSE   20   male
        TRUE   20   male

       FALSE   20 female     # row 4 of short data: 2 died, 1 survived
       FALSE   20 female
        TRUE   20 female

In this example the total number of survived and died for each row in the short data is the same, but in general that need not be true.

Usage

## S3 method for class 'formula'
expand.bpairs(formula = stop("no 'formula' argument"), data = NULL, sort = FALSE, ...)

## Default S3 method:
expand.bpairs(data = stop("no 'data' argument"), y = NULL, sort = FALSE, ...)

Arguments

formula

Model formula such as survived + died ~ dose + temp.

data

Matrix or dataframe containing the data.

y

Model response. One of:
o Two column matrix or dataframe of binomial pairs e.g. cbind(survived, died=20-survived)
o Two-element numeric vector specifying the response columns in data e.g. c(1,2)
o Two-element character vector specifying the response column names in data e.g. c("survived", "died"). The full names must be used (partial matching isn't supported).

sort

Default FALSE. Use TRUE to sort the rows of the long data so it is returned in canonical form, independent of the row order of the short data. The long data is sorted on predictor values; predictors on the left take precedence in the sort order.

...

Unused, but provided for generic/method consistency.

Value

A dataframe of the data in the long form, with expanded binomial pairs. The first column of the data will be the response column (a column of TRUEs and FALSEs).

Additionally, the returned value has two attached attributes:

bpairs.index A vector of row indices into the returned data. Can be used to reconstruct the short data from the long data (although this package does not yet provide a function to do so).

ynames Column names of the original response (a two-element character vector).

Examples

survived <- c(3,2,1,1) # short data for demo (too short to build a real model)
died     <- c(0,1,2,2)
dose <- c(10,10,20,20)
sex  <- factor(c("male", "female", "male", "female"))

short.data <- data.frame(survived, died, dose, sex)

expand.bpairs(survived + died ~ ., short.data) # returns long form of the data

# expand.bpairs(data=short.data, y=cbind(survived, died)) # equivalent
# expand.bpairs(short.data, c(1,2))                       # equivalent
# expand.bpairs(short.data, c("survived", "died"))        # equivalent

# For example models, see the earth vignette
# section "Short versus long binomial data".

earth

Multivariate Adaptive Regression Splines

v5.3.0
GPL-3
Authors
Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's leaps wrapper.
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.