VGAM: betabinomial – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

betabinomial

Beta-binomial Distribution Family Function

Description

Fits a beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.

Usage

betabinomial(lmu = "logitlink", lrho = "logitlink", irho = NULL, imethod = 1,
             ishrinkage = 0.95, nsimEIM = NULL, zero = "rho")

Arguments

`lmu, lrho`	Link functions applied to the two parameters. See `Links` for more choices. The defaults ensure the parameters remain in (0,1), however, see the warning below.
`irho`	Optional initial value for the correlation parameter. If given, it must be in (0,1), and is recyled to the necessary length. Assign this argument a value if a convergence failure occurs. Having `irho = NULL` means an initial value is obtained internally, though this can give unsatisfactory results.
`imethod`	An integer with value `1` or `2` or ..., which specifies the initialization method for mu. If failure to converge occurs try the another value and/or else specify a value for `irho`.
`zero`	Specifyies which linear/additive predictor is to be modelled as an intercept only. If assigned, the single value can be either `1` or `2`. The default is to have a single correlation parameter. To model both parameters as functions of the covariates assign `zero = NULL`. See `CommonVGAMffArguments` for more information.
`ishrinkage, nsimEIM`	See `CommonVGAMffArguments` for more information. The argument `ishrinkage` is used only if `imethod = 2`. Using the argument `nsimEIM` may offer large advantages for large values of N and/or large data sets.

Details

There are several parameterizations of the beta-binomial distribution. This family function directly models the mean and correlation parameter, i.e., the probability of success. The model can be written T|P=p ~ Binomial(N,p) where P has a beta distribution with shape parameters alpha and beta. Here, N is the number of trials (e.g., litter size), T=NY is the number of successes, and p is the probability of a success (e.g., a malformation). That is, Y is the proportion of successes. Like binomialff, the fitted values are the estimated probability of success (i.e., E[Y] and not E[T]) and the prior weights N are attached separately on the object in a slot.

The probability function is

P(T=t) = choose(N,t) Be(alpha+t, beta+N-t) / Be(alpha, beta)

where t=0,1,…,N, and Be is the beta function with shape parameters alpha and beta. Recall Y = T/N is the real response being modelled.

The default model is eta1 =logit(mu) and eta2 = logit(rho) because both parameters lie between 0 and 1. The mean (of Y) is p = mu = alpha / (alpha + beta) and the variance (of Y) is mu(1-mu)(1+(N-1)rho)/N. Here, the correlation rho is given by 1/(1 + alpha + beta) and is the correlation between the N individuals within a litter. A litter effect is typically reflected by a positive value of rho. It is known as the over-dispersion parameter.

This family function uses Fisher scoring. Elements of the second-order expected derivatives with respect to alpha and beta are computed numerically, which may fail for large alpha, beta, N or else take a long time.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm.

Suppose fit is a fitted beta-binomial model. Then fit@y contains the sample proportions y, fitted(fit) returns estimates of E(Y), and weights(fit, type="prior") returns the number of trials N.

Warning

If the estimated rho parameter is close to 0 then it pays to try lrho = "rhobitlink". One day this may become the default link function.

This family function is prone to numerical difficulties due to the expected information matrices not being positive-definite or ill-conditioned over some regions of the parameter space. If problems occur try setting irho to some numerical value, nsimEIM = 100, say, or else use etastart argument of vglm, etc.

Note

This function processes the input in the same way as binomialff. But it does not handle the case N=1 very well because there are two parameters to estimate, not one, for each row of the input. Cases where N=1 can be omitted via the subset argument of vglm.

The extended beta-binomial distribution of Prentice (1986) is currently not implemented in the VGAM package as it has range-restrictions for the correlation parameter that are currently too difficult to handle in this package. However, try lrho = "rhobitlink".

Author(s)

T. W. Yee

References

Moore, D. F. and Tsiatis, A. (1991). Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics, 47, 383–401.

Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321–327.

Examples

# Example 1
bdata <- data.frame(N = 10, mu = 0.5, rho = 0.8)
bdata <- transform(bdata,
                   y = rbetabinom(100, size = N, prob = mu, rho = rho))
fit <- vglm(cbind(y, N-y) ~ 1, betabinomial, data = bdata, trace = TRUE)
coef(fit, matrix = TRUE)
Coef(fit)
head(cbind(depvar(fit), weights(fit, type = "prior")))


# Example 2
fit <- vglm(cbind(R, N-R) ~ 1, betabinomial, lirat,
            trace = TRUE, subset = N > 1)
coef(fit, matrix = TRUE)
Coef(fit)
t(fitted(fit))
t(depvar(fit))
t(weights(fit, type = "prior"))


# Example 3, which is more complicated
lirat <- transform(lirat, fgrp = factor(grp))
summary(lirat)  # Only 5 litters in group 3
fit2 <- vglm(cbind(R, N-R) ~ fgrp + hb, betabinomial(zero = 2),
             data = lirat, trace = TRUE, subset = N > 1)
coef(fit2, matrix = TRUE)
## Not run:  with(lirat, plot(hb[N > 1], fit2@misc$rho,
                 xlab = "Hemoglobin", ylab = "Estimated rho",
                 pch = as.character(grp[N > 1]), col = grp[N > 1])) 
## End(Not run)
## Not run:   # cf. Figure 3 of Moore and Tsiatis (1991)
with(lirat, plot(hb, R / N, pch = as.character(grp), col = grp,
                 xlab = "Hemoglobin level", ylab = "Proportion Dead",
                 main = "Fitted values (lines)", las = 1))
smalldf <- with(lirat, lirat[N > 1, ])
for (gp in 1:4) {
  xx <- with(smalldf, hb[grp == gp])
  yy <- with(smalldf, fitted(fit2)[grp == gp])
  ooo <- order(xx)
  lines(xx[ooo], yy[ooo], col = gp)
} 
## End(Not run)

VGAM

Vector Generalized Linear and Additive Models

v1.1-5

GPL-3

Authors

Thomas Yee [aut, cre], Cleve Moler [ctb] (author of several LINPACK routines)

Initial release

2021-01-13