Negative Binomial Distribution Family Function
Maximum likelihood estimation of the two parameters of a negative binomial distribution.
negbinomial(zero = "size", parallel = FALSE, deviance.arg = FALSE, type.fitted = c("mean", "quantiles"), percentiles = c(25, 50, 75), mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30, lmu = "loglink", lsize = "loglink", imethod = 1, imu = NULL, iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL, gsize.mux = exp(c(-30, -20, -15, -10, -6:3))) polya(zero = "size", type.fitted = c("mean", "prob"), mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30, lprob = "logitlink", lsize = "loglink", imethod = 1, iprob = NULL, iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL, gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL) polyaR(zero = "size", type.fitted = c("mean", "prob"), mds.min = 1e-3, nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-7, max.support = 4000, max.chunk.MB = 30, lsize = "loglink", lprob = "logitlink", imethod = 1, iprob = NULL, iprobs.y = NULL, gprobs.y = ppoints(6), isize = NULL, gsize.mux = exp(c(-30, -20, -15, -10, -6:3)), imunb = NULL)
zero |
Can be an integer-valued vector, and if so, then
it is usually assigned -2
or 2. Specifies which of the two
linear/additive predictors are modelled as an intercept
only. By default, the k parameter (after |
lmu, lsize, lprob |
Link functions applied to the mu, k
and p parameters.
See |
imu, imunb, isize, iprob |
Optional initial values for the mean and k and p.
For k, if failure to converge occurs then try different values
(and/or use |
nsimEIM |
This argument is used
for computing the diagonal element of the
expected information matrix (EIM) corresponding to k
based on the simulated Fisher scoring (SFS) algorithm.
See |
cutoff.prob |
Fed into the |
max.chunk.MB, max.support |
|
mds.min |
Numeric.
Minimum value of the NBD mean divided by |
eps.trig |
Numeric.
A small positive value used in the computation of the EIMs.
It focusses on the denominator of the terms of a series.
Each term in the series (that is used to approximate an infinite series)
has a value greater than |
gsize.mux |
Similar to |
type.fitted, percentiles |
See |
deviance.arg |
Logical.
If |
imethod |
An integer with value |
parallel |
See |
gprobs.y |
A vector representing a grid;
passed into the |
iprobs.y |
Passed into the |
The negative binomial distribution (NBD)
can be motivated in several ways,
e.g., as a Poisson distribution with a mean that is gamma
distributed.
There are several common parametrizations of the NBD.
The one used by negbinomial()
uses the
mean mu and an index parameter
k, both which are positive.
Specifically, the density of a random variable Y is
f(y;mu,k) = C_{y}^{y + k - 1} [mu/(mu+k)]^y [k/(k+mu)]^k
where y=0,1,2,…,
and mu > 0 and k > 0.
Note that the dispersion parameter is
1/k, so that as k approaches infinity the
NBD approaches a Poisson distribution.
The response has variance Var(Y)=mu*(1+mu/k).
When fitted, the fitted.values
slot of the object contains
the estimated value of the mu parameter, i.e., of the mean
E(Y).
It is common for some to use alpha=1/k as the
ancillary or heterogeneity parameter;
so common alternatives for lsize
are
negloglink
and
reciprocallink
.
For polya
the density is
f(y;p,k) = C_{y}^{y + k - 1} [1 - p]^y p^k
where y=0,1,2,…, and k > 0 and 0 < p < 1.
Family function polyaR()
is the same as polya()
except
the order of the two parameters are switched.
The reason is that polyaR()
tries to match with
rnbinom
closely
in terms of the argument order, etc.
Should the probability parameter be of primary interest,
probably, users will prefer using polya()
rather than
polyaR()
.
Possibly polyaR()
will be decommissioned one day.
The NBD can be coerced into the
classical GLM framework with one of the parameters being
of interest and the other treated as a nuisance/scale
parameter (this is implemented in the MASS library). The
VGAM family function negbinomial()
treats both
parameters on the same footing, and estimates them both
by full maximum likelihood estimation.
The parameters mu and k are independent (diagonal EIM), and the confidence region for k is extremely skewed so that its standard error is often of no practical use. The parameter 1/k has been used as a measure of aggregation. For the NB-C the EIM is not diagonal.
These VGAM family functions handle
multiple responses, so that a response matrix can be
inputted. The number of columns is the number
of species, say, and setting zero = -2
means that
all species have a k equalling a (different)
intercept only.
Conlisk, et al. (2007) show that fitting the NBD to presence-absence data will result in identifiability problems. However, the model is identifiable if the response values include 0, 1 and 2.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
Poisson regression corresponds to k equalling
infinity. If the data is Poisson or close to Poisson,
numerical problems may occur.
Some corrective measures are taken, e.g.,
k is effectively capped (relative to the mean) during estimation
to some large value and a warning is issued.
And setting stepsize = 0.5
for half stepping is
probably a good idea too when the data is extreme.
The NBD is a strictly unimodal
distribution. Any data set that does not exhibit a mode (somewhere
in the middle) makes the estimation problem difficult.
Set trace = TRUE
to monitor convergence.
These functions are fragile; the maximum likelihood
estimate of the index parameter is fraught (see Lawless, 1987).
Other alternatives to negbinomial
are
to fit a NB-1 or RR-NB (aka NB-P) model; see Yee (2014).
Also available are the NB-C, NB-H and NB-G.
Assigning values to the isize
argument may lead
to a local solution, and smaller values are preferred
over large values when using this argument.
If one wants to force SFS
to be used on all observations, then
set max.support = 0
or max.chunk.MB = 0
.
If one wants to force the exact method
to be used for all observations, then
set max.support = Inf
.
If the computer has much memory, then trying
max.chunk.MB = Inf
and
max.support = Inf
may provide a small speed increase.
If SFS is used at all, then the @weights
slot of the
fitted object will be a matrix;
otherwise that slot will be a 0 x 0
matrix.
An alternative to the NBD is the generalized Poisson distribution,
genpoisson1
,
genpoisson2
and
genpoisson0
,
since that also handles overdispersion wrt Poisson.
It has one advantage in that its EIM can be computed straightforwardly.
Yet to do: write a family function which uses the methods of moments estimator for k.
These 3 functions implement 2 common parameterizations
of the negative binomial (NB). Some people called the
NB with integer k the Pascal distribution,
whereas if k is real then this is the Polya
distribution. I don't. The one matching the details of
rnbinom
in terms of p
and k is polya()
.
Suppose the response is called ymat
.
For negbinomial()
the diagonal element of the expected information matrix
(EIM) for parameter k
involves an infinite series; consequently SFS
(see nsimEIM
) is used as the backup algorithm only.
SFS should be better if max(ymat)
is large,
e.g., max(ymat) > 1000
,
or if there are any outliers in ymat
.
The default algorithm involves a finite series approximation
to the support 0:Inf
;
the arguments
max.memory
,
min.size
and
cutoff.prob
are pertinent.
Regardless of the algorithm used,
convergence problems may occur, especially when the response has large
outliers or is large in magnitude.
If convergence failure occurs, try using arguments
(in recommended decreasing order)
max.support
,
nsimEIM
,
cutoff.prob
,
iprobs.y
,
imethod
,
isize
,
zero
,
max.chunk.MB
.
The function negbinomial
can be used by the fast algorithm in
cqo
, however, setting eq.tolerances = TRUE
and
I.tolerances = FALSE
is recommended.
In the first example below (Bliss and Fisher, 1953), from each of 6 McIntosh apple trees in an orchard that had been sprayed, 25 leaves were randomly selected. On each of the leaves, the number of adult female European red mites were counted.
There are two special uses of negbinomial
for handling count data.
Firstly,
when used by rrvglm
this
results in a continuum of models in between and
inclusive of quasi-Poisson and negative binomial regression.
This is known as a reduced-rank negative binomial model (RR-NB).
It fits a negative binomial log-linear regression with variance function
Var(Y) = mu + delta1 * mu^delta2
where delta1
and delta2
are parameters to be estimated by MLE.
Confidence intervals are available for delta2,
therefore it can be decided upon whether the
data are quasi-Poisson or negative binomial, if any.
Secondly,
the use of negbinomial
with parallel = TRUE
inside vglm
can result in a model similar to quasipoisson
.
This is named the NB-1 model.
The dispersion parameter is estimated by MLE whereas
glm
uses the method of moments.
In particular, it fits a negative binomial log-linear regression
with variance function
Var(Y) = phi0 * mu
where phi0
is a parameter to be estimated by MLE.
Confidence intervals are available for phi0.
Thomas W. Yee,
and with a lot of help by Victor Miranda
to get it going with nbcanlink
(NB-C).
Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics 15, 209–225.
Hilbe, J. M. (2011). Negative Binomial Regression, 2nd Edition. Cambridge: Cambridge University Press.
Bliss, C. and Fisher, R. A. (1953). Fitting the negative binomial distribution to biological data. Biometrics 9, 174–200.
Conlisk, E. and Conlisk, J. and Harte, J. (2007). The impossibility of estimating a negative binomial clustering parameter from presence-absence data: A comment on He and Gaston. The American Naturalist 170, 651–654.
Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
Yee, T. W. (2020). The VGAM package for negative binomial regression. Australian \& New Zealand Journal of Statistics, 61, in press.
quasipoisson
,
poissonff
,
zinegbinomial
,
negbinomial.size
(e.g., NB-G),
nbcanlink
(NB-C),
posnegbinomial
,
genpoisson1
,
genpoisson2
,
genpoisson0
,
inv.binomial
,
NegBinomial
,
nbordlink
,
rrvglm
,
cao
,
cqo
,
CommonVGAMffArguments
,
simulate.vlm
,
ppoints
,
# Example 1: apple tree data (Bliss and Fisher, 1953) appletree <- data.frame(y = 0:7, w = c(70, 38, 17, 10, 9, 3, 2, 1)) fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree, weights = w, crit = "coef") # Obtain the deviance fit <- vglm(y ~ 1, negbinomial(deviance = TRUE), data = appletree, weights = w, half.step = FALSE) # Alternative method summary(fit) coef(fit, matrix = TRUE) Coef(fit) # For intercept-only models deviance(fit) # NB2 only; needs 'crit = "coef"' & 'deviance = TRUE' above # Example 2: simulated data with multiple responses ## Not run: ndata <- data.frame(x2 = runif(nn <- 200)) ndata <- transform(ndata, y1 = rnbinom(nn, mu = exp(3+x2), size = exp(1)), y2 = rnbinom(nn, mu = exp(2-x2), size = exp(0))) fit1 <- vglm(cbind(y1, y2) ~ x2, negbinomial, data = ndata, trace = TRUE) coef(fit1, matrix = TRUE) ## End(Not run) # Example 3: large counts implies SFS is used ## Not run: ndata <- transform(ndata, y3 = rnbinom(nn, mu = exp(10+x2), size = exp(1))) with(ndata, range(y3)) # Large counts fit2 <- vglm(y3 ~ x2, negbinomial, data = ndata, trace = TRUE) coef(fit2, matrix = TRUE) head(fit2@weights) # Non-empty; SFS was used ## End(Not run) # Example 4: a NB-1 to estimate a negative binomial with Var(Y) = phi0 * mu nn <- 200 # Number of observations phi0 <- 10 # Specify this; should be greater than unity delta0 <- 1 / (phi0 - 1) mydata <- data.frame(x2 = runif(nn), x3 = runif(nn)) mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3)) mydata <- transform(mydata, y3 = rnbinom(nn, mu = mu, size = delta0 * mu)) ## Not run: plot(y3 ~ x2, data = mydata, pch = "+", col = "blue", main = paste("Var(Y) = ", phi0, " * mu", sep = ""), las = 1) ## End(Not run) nb1 <- vglm(y3 ~ x2 + x3, negbinomial(parallel = TRUE, zero = NULL), data = mydata, trace = TRUE) # Extracting out some quantities: cnb1 <- coef(nb1, matrix = TRUE) mydiff <- (cnb1["(Intercept)", "loglink(size)"] - cnb1["(Intercept)", "loglink(mu)"]) delta0.hat <- exp(mydiff) (phi.hat <- 1 + 1 / delta0.hat) # MLE of phi summary(nb1) # Obtain a 95 percent confidence interval for phi0: myvec <- rbind(-1, 1, 0, 0) (se.mydiff <- sqrt(t(myvec) %*% vcov(nb1) %*% myvec)) ci.mydiff <- mydiff + c(-1.96, 1.96) * c(se.mydiff) ci.delta0 <- ci.exp.mydiff <- exp(ci.mydiff) (ci.phi0 <- 1 + 1 / rev(ci.delta0)) # The 95 percent conf. interval for phi0 Confint.nb1(nb1) # Quick way to get it summary(glm(y3 ~ x2 + x3, quasipoisson, mydata))$disper # cf. moment estimator
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.