Regression for Grouped Data - Coarse Data
grouped
is used to fit regression models for grouped or coarse data under the assumption
that the data are Coarsened At Random.
grouped(formula, link = c("identity", "log", "logit"), distribution = c("normal", "t", "logistic"), data, subset, na.action, str.values, df = NULL, iter = 3, ...)
formula |
a two-sided formula describing the model structure. In the left-hand side, a two-column response
matrix must be supplied, specifying the lower and upper limits (1st and 2nd column, respectively)
of the interval in which the true response lies. They can be defined arbitrarily or you can use the
functions |
link |
the link function under which the underlying response variable follows the distribution given by the
|
distribution |
the assumed distribution for the true latent response variable. Available choices are
|
data |
an optional |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain |
str.values |
a numeric vector of starting values. |
df |
a scalar numeric value denoting the degrees of freedom when the underlying distribution for the response variable is assumed to be Student's-t. |
iter |
the number of extra times to call |
... |
additional arguments; currently none is used. |
Let Z_i, i = 1, ..., n be a random sample from a response variable of interest. In many
problems one can think of the sample space S_i of Z_i as being partitioned into a number of groups; one
then observes not the exact value of Z_i but the group into which it falls. Data generated in this way are called
grouped (Heitjan, 1989). The function grouped
and this package are devoted in the analysis of such data in the
case the data are Coarsened At Random (Heitjan and Rubin, 1991).
The framework we use assumes a latent variable Z_i which is coarsely measured and for which we only know Y_{li} and Y_{ui}, i.e., the interval in which Z_i lies. Given some covariates X_i, Z_i|X_i may assume either a Normal, a Logistic or (generalized) Student's-t distribution. In addition three link functions are available for greater flexibility. In particular, the likelihood is of the following form
L_i(β, σ) = F[(y_u^* - xβ)/σ] - F[(y_l^* - xβ)/σ],
where F(.) denotes the cdf of the assumed distribution given by the argument distribution
and
y_l^* = φ(y_l), where φ(.) denotes the link function,
and y_u is defined analogously.
An interesting example of coarse data is the various quality of life indexes. The observed value of such indexes can be thought of as a rounded version of the true latent quality of life that the index attempts to capture. Applications of this approach can be found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various other examples of grouped and coarse data can be found in Heitjan (1989; 1993).
an object of class grouped
is a list with the following components:
coefficients |
the estimated coefficients, including the standard deviation σ. |
hessian |
the approximate Hessian matrix at convergence returned by |
fitted |
the fitted values. |
details |
a list with components: (i) |
call |
the matched call. |
Dimitris Rizopoulos d.rizopoulos@erasmusmc.nl
Heitjan, D. (1989) Inference from grouped continuous data: A review (with discussion). Statistical Science, 4, 164–183.
Heitjan, D. (1993) Ignorability and coarse data: some biomedical examples. Biometrics, 49, 1099–1109.
Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Annals of Statistics, 19, 2244–2253.
Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2007) The logistic-transform for bounded outcome scores. Biostatistics, 8, 72–85.
Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2006) Power and sample size calculations for discrete bounded outcomes. Statistics in Medicine, 25, 4241–4252.
grouped(cbind(lo, up) ~ treat * x, link = "logit", data = Sdata) grouped(equispaced(r, n) ~ x1 * x2, link = "logit", data = Seeds) # See Figure 1 and Table 1 in Heitjan (1989) y <- iris[iris$Species == "setosa", "Petal.Width"] index <- cbind(seq(0.05, 0.55, 0.1), seq(0.15, 0.65, 0.1)) n <- length(y) a <- b <- numeric(n) for(i in 1:n){ ind <- which(index[, 2] - y[i] > 0)[1] a[i] <- index[ind, 1] b[i] <- index[ind, 2] } summary(grouped(cbind(a, b) ~ 1)) # See Figure 1 and Table 1 in Heitjan (1989) y <- iris[iris$Species == "setosa", "Petal.Length"] index <- cbind(seq(0.95, 1.75, 0.2), seq(1.15, 1.95, 0.2)) n <- length(y) a <- b <- numeric(n) for(i in 1:n){ ind <- which(index[, 2] - y[i] > 0)[1] a[i] <- index[ind, 1] b[i] <- index[ind, 2] } summary(grouped(cbind(a, b) ~ 1))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.