Beta-binomial Distribution Family Function
Fits a beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.
betabinomial(lmu = "logitlink", lrho = "logitlink", irho = NULL, imethod = 1, ishrinkage = 0.95, nsimEIM = NULL, zero = "rho")
lmu, lrho |
Link functions applied to the two parameters.
See |
irho |
Optional initial value for the correlation parameter.
If given, it must be in (0,1), and is recyled to the necessary
length. Assign this argument a value if a convergence failure occurs.
Having |
imethod |
An integer with value |
zero |
Specifyies which
linear/additive predictor is to be modelled as an intercept only.
If assigned, the single value can be either |
ishrinkage, nsimEIM |
See |
There are several parameterizations of the beta-binomial distribution.
This family function directly models the mean and correlation
parameter, i.e.,
the probability of success.
The model can be written
T|P=p ~ Binomial(N,p)
where P has a beta distribution with shape parameters
alpha and beta. Here,
N is the number of trials (e.g., litter size),
T=NY is the number of successes, and
p is the probability of a success (e.g., a malformation).
That is, Y is the proportion of successes. Like
binomialff
, the fitted values are the
estimated probability
of success (i.e., E[Y] and not E[T])
and the prior weights N are attached separately on the
object in a slot.
The probability function is
P(T=t) = choose(N,t) Be(alpha+t, beta+N-t) / Be(alpha, beta)
where t=0,1,…,N, and Be is the
beta
function
with shape parameters alpha and beta.
Recall Y = T/N is the real response being modelled.
The default model is eta1 =logit(mu) and eta2 = logit(rho) because both parameters lie between 0 and 1. The mean (of Y) is p = mu = alpha / (alpha + beta) and the variance (of Y) is mu(1-mu)(1+(N-1)rho)/N. Here, the correlation rho is given by 1/(1 + alpha + beta) and is the correlation between the N individuals within a litter. A litter effect is typically reflected by a positive value of rho. It is known as the over-dispersion parameter.
This family function uses Fisher scoring. Elements of the second-order expected derivatives with respect to alpha and beta are computed numerically, which may fail for large alpha, beta, N or else take a long time.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
.
Suppose fit
is a fitted beta-binomial model. Then
fit@y
contains the sample proportions y,
fitted(fit)
returns estimates of E(Y), and
weights(fit, type="prior")
returns the number
of trials N.
If the estimated rho parameter is close to 0 then it pays to
try lrho = "rhobitlink"
. One day this may become the default
link function.
This family function is prone to numerical difficulties due to
the expected information matrices not being positive-definite
or ill-conditioned over some regions of the parameter space.
If problems occur try setting irho
to some numerical
value, nsimEIM = 100
, say, or else use etastart
argument of vglm
, etc.
This function processes the input in the same way
as binomialff
. But it does not handle
the case N=1 very well because there are two
parameters to estimate, not one, for each row of the input.
Cases where N=1 can be omitted via the
subset
argument of vglm
.
The extended beta-binomial distribution of Prentice (1986)
is currently not implemented in the VGAM package as it has
range-restrictions for the correlation parameter that are currently
too difficult to handle in this package.
However, try lrho = "rhobitlink"
.
T. W. Yee
Moore, D. F. and Tsiatis, A. (1991). Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics, 47, 383–401.
Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321–327.
# Example 1 bdata <- data.frame(N = 10, mu = 0.5, rho = 0.8) bdata <- transform(bdata, y = rbetabinom(100, size = N, prob = mu, rho = rho)) fit <- vglm(cbind(y, N-y) ~ 1, betabinomial, data = bdata, trace = TRUE) coef(fit, matrix = TRUE) Coef(fit) head(cbind(depvar(fit), weights(fit, type = "prior"))) # Example 2 fit <- vglm(cbind(R, N-R) ~ 1, betabinomial, lirat, trace = TRUE, subset = N > 1) coef(fit, matrix = TRUE) Coef(fit) t(fitted(fit)) t(depvar(fit)) t(weights(fit, type = "prior")) # Example 3, which is more complicated lirat <- transform(lirat, fgrp = factor(grp)) summary(lirat) # Only 5 litters in group 3 fit2 <- vglm(cbind(R, N-R) ~ fgrp + hb, betabinomial(zero = 2), data = lirat, trace = TRUE, subset = N > 1) coef(fit2, matrix = TRUE) ## Not run: with(lirat, plot(hb[N > 1], fit2@misc$rho, xlab = "Hemoglobin", ylab = "Estimated rho", pch = as.character(grp[N > 1]), col = grp[N > 1])) ## End(Not run) ## Not run: # cf. Figure 3 of Moore and Tsiatis (1991) with(lirat, plot(hb, R / N, pch = as.character(grp), col = grp, xlab = "Hemoglobin level", ylab = "Proportion Dead", main = "Fitted values (lines)", las = 1)) smalldf <- with(lirat, lirat[N > 1, ]) for (gp in 1:4) { xx <- with(smalldf, hb[grp == gp]) yy <- with(smalldf, fitted(fit2)[grp == gp]) ooo <- order(xx) lines(xx[ooo], yy[ooo], col = gp) } ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.