Positive Negative Binomial Distribution Family Function
Maximum likelihood estimation of the two parameters of a positive negative binomial distribution.
posnegbinomial(zero = "size", type.fitted = c("mean", "munb", "prob0"), mds.min = 0.001, nsimEIM = 500, cutoff.prob = 0.999, eps.trig = 1e-07, max.support = 4000, max.chunk.MB = 30, lmunb = "loglink", lsize = "loglink", imethod = 1, imunb = NULL, iprobs.y = NULL, gprobs.y = ppoints(8), isize = NULL, gsize.mux = exp(c(-30, -20, -15, -10, -6:3)))
lmunb |
Link function applied to the |
lsize |
Parameter link function applied to the dispersion parameter,
called |
isize |
Optional initial value for |
nsimEIM, zero, eps.trig |
|
mds.min, iprobs.y, cutoff.prob |
Similar to |
imunb, max.support |
Similar to |
max.chunk.MB, gsize.mux |
Similar to |
imethod, gprobs.y |
See |
type.fitted |
See |
The positive negative binomial distribution is an ordinary negative binomial distribution but with the probability of a zero response being zero. The other probabilities are scaled to sum to unity.
This family function is based on negbinomial
and most
details can be found there. To avoid confusion, the parameter
munb
here corresponds to the mean of an ordinary negative
binomial distribution negbinomial
. The mean of
posnegbinomial
is
munb / (1-p(0))
where p(0) = (k/(k + munb))^k is the probability an ordinary negative binomial distribution has a zero value.
The parameters munb
and k
are not independent in the
positive negative binomial distribution, whereas they are in the
ordinary negative binomial distribution.
This function handles multiple responses, so that a matrix
can be used as the response. The number of columns is the number
of species, say, and setting zero = -2
means that all
species have a k
equalling a (different) intercept only.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
,
rrvglm
and vgam
.
This family function is fragile;
at least two cases will lead to numerical problems.
Firstly,
the positive-Poisson model corresponds to k
equalling infinity.
If the data is positive-Poisson or close to positive-Poisson,
then the estimated k
will diverge to Inf
or some
very large value.
Secondly, if the data is clustered about the value 1 because
the munb
parameter is close to 0
then numerical problems will also occur.
Users should set trace = TRUE
to monitor convergence.
In the situation when both cases hold, the result returned
(which will be untrustworthy) will depend on the initial values.
The negative binomial distribution (NBD) is a strictly unimodal distribution.
Any data set that does not exhibit a mode (in the middle) makes the
estimation problem difficult.
The positive NBD inherits this feature.
Set trace = TRUE
to monitor convergence.
See the example below of a data set where posbinomial()
fails; the so-called solution is extremely poor.
This is partly due to a lack of a
unimodal shape because the number of counts decreases only.
This long tail makes it very difficult to estimate the mean
parameter with any certainty. The result too is that the
size
parameter is numerically fraught.
This VGAM family function inherits the same warnings as
negbinomial
.
And if k
is much less than 1 then the estimation may
be slow.
If the estimated k is very large then fitting a
pospoisson
model is a good idea.
If both munb
and k are large then it may be
necessary to decrease eps.trig
and increase
max.support
so that the EIMs are positive-definite,
e.g.,
eps.trig = 1e-8
and max.support = Inf
.
Thomas W. Yee
Barry, S. C. and Welsh, A. H. (2002). Generalized additive modelling and zero inflated count data. Ecological Modelling, 157, 179–188.
Williamson, E. and Bretherton, M. H. (1964). Tables of the logarithmic series distribution. Annals of Mathematical Statistics, 35, 284–297.
pdata <- data.frame(x2 = runif(nn <- 1000)) pdata <- transform(pdata, y1 = rgaitnbinom(nn, exp(1), munb.p = exp(0+2*x2), truncate = 0), y2 = rgaitnbinom(nn, exp(3), munb.p = exp(1+2*x2), truncate = 0)) fit <- vglm(cbind(y1, y2) ~ x2, posnegbinomial, data = pdata, trace = TRUE) coef(fit, matrix = TRUE) dim(depvar(fit)) # Using dim(fit@y) is not recommended # Another artificial data example pdata2 <- data.frame(munb = exp(2), size = exp(3)); nn <- 1000 pdata2 <- transform(pdata2, y3 = rgaitnbinom(nn, size, munb.p = munb, truncate = 0)) with(pdata2, table(y3)) fit <- vglm(y3 ~ 1, posnegbinomial, data = pdata2, trace = TRUE) coef(fit, matrix = TRUE) with(pdata2, mean(y3)) # Sample mean head(with(pdata2, munb/(1-(size/(size+munb))^size)), 1) # Population mean head(fitted(fit), 3) head(predict(fit), 3) # Example: Corbet (1943) butterfly Malaya data fit <- vglm(ofreq ~ 1, posnegbinomial, weights = species, data = corbet) coef(fit, matrix = TRUE) Coef(fit) (khat <- Coef(fit)["size"]) pdf2 <- dgaitnbinom(with(corbet, ofreq), khat, munb.p = fitted(fit), truncate = 0) print(with(corbet, cbind(ofreq, species, fitted = pdf2*sum(species))), dig = 1) ## Not run: with(corbet, matplot(ofreq, cbind(species, fitted = pdf2*sum(species)), las = 1, xlab = "Observed frequency (of individual butterflies)", type = "b", ylab = "Number of species", col = c("blue", "orange"), main = "blue 1s = observe; orange 2s = fitted")) ## End(Not run) ## Not run: # This data (courtesy of Maxim Gerashchenko) causes posbinomial() to fail pnbd.fail <- data.frame( y1 = c(1:16, 18:21, 23:28, 33:38, 42, 44, 49:51, 55, 56, 58, 59, 61:63, 66, 73, 76, 94, 107, 112, 124, 190, 191, 244), ofreq = c(130, 80, 38, 23, 22, 11, 21, 14, 6, 7, 9, 9, 9, 4, 4, 5, 1, 4, 6, 1, 3, 2, 4, 3, 4, 5, 3, 1, 2, 1, 1, 4, 1, 2, 2, 1, 3, 1, 1, 2, 2, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1)) fit.fail <- vglm(y1 ~ 1, weights = ofreq, posnegbinomial, trace = TRUE, data = pnbd.fail) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.