VGAM: dirmultinomial – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

VGAM

dirmultinomial

Fitting a Dirichlet-Multinomial Distribution

Description

Fits a Dirichlet-multinomial distribution to a matrix response.

Usage

dirmultinomial(lphi = "logitlink", iphi = 0.10, parallel = FALSE, zero = "M")

Arguments

`lphi`	Link function applied to the phi parameter, which lies in the open unit interval (0,1). See `Links` for more choices.
`iphi`	Numeric. Initial value for phi. Must be in the open unit interval (0,1). If a failure to converge occurs then try assigning this argument a different value.
`parallel`	A logical (formula not allowed here) indicating whether the probabilities pi_1,…,pi_{M-1} are to be equal via equal coefficients. Note pi_M will generally be different from the other probabilities. Setting `parallel = TRUE` will only work if you also set `zero = NULL` because of interference between these arguments (with respect to the intercept term).
`zero`	An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set \{1,2,…,M\}. If the character `"M"` then this means the numerical value M, which corresponds to linear/additive predictor associated with phi. Setting `zero = NULL` means none of the values from the set \{1,2,…,M\}.

Details

The Dirichlet-multinomial distribution arises from a multinomial distribution where the probability parameters are not constant but are generated from a multivariate distribution called the Dirichlet distribution. The Dirichlet-multinomial distribution has probability function

P(Y_1=y_1,…,Y_M=y_M) = C_{y_1,…,y_M}^{N_{*}} prod_{j=1}^{M} prod_{r=1}^{y_{j}} (pi_j (1-phi) + (r-1)phi) / prod_{r=1}^{N_{*}} (1-phi + (r-1)phi)

where phi is the over-dispersion parameter and N_* = y_1+\cdots+y_M. Here, C_b^a means “a choose b” and refers to combinations (see choose). The above formula applies to each row of the matrix response. In this VGAM family function the first M-1 linear/additive predictors correspond to the first M-1 probabilities via

eta_j = log(P[Y=j]/ P[Y=M]) = log(pi_j/pi_M)

where eta_j is the jth linear/additive predictor (eta_M=0 by definition for P[Y=M] but not for phi) and j=1,…,M-1. The Mth linear/additive predictor corresponds to lphi applied to phi.

Note that E(Y_j) = N_* pi_j but the probabilities (returned as the fitted values) pi_j are bundled together as a M-column matrix. The quantities N_* are returned as the prior weights.

The beta-binomial distribution is a special case of the Dirichlet-multinomial distribution when M=2; see betabinomial. It is easy to show that the first shape parameter of the beta distribution is shape1=pi*(1/phi-1) and the second shape parameter is shape2=(1-pi)*(1/phi-1). Also, phi=1/(1+shape1+shape2), which is known as the intra-cluster correlation coefficient.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

If the model is an intercept-only model then @misc (which is a list) has a component called shape which is a vector with the M values pi_j * (1/phi-1).

Warning

This VGAM family function is prone to numerical problems, especially when there are covariates.

Note

The response can be a matrix of non-negative integers, or else a matrix of sample proportions and the total number of counts in each row specified using the weights argument. This dual input option is similar to multinomial.

To fit a ‘parallel’ model with the phi parameter being an intercept-only you will need to use the constraints argument.

Currently, Fisher scoring is implemented. To compute the expected information matrix a for loop is used; this may be very slow when the counts are large. Additionally, convergence may be slower than usual due to round-off error when computing the expected information matrices.

Author(s)

Thomas W. Yee

References

Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.

Tvedebrink, T. (2010). Overdispersion in allelic counts and θ-correction in forensic genetics. Theoretical Population Biology, 78, 200–210.

Yu, P. and Shaw, C. A. (2014). An Efficient Algorithm for Accurate Computation of the Dirichlet-Multinomial Log-Likelihood Function. Bioinformatics, 30, 1547–54.

Examples

nn <- 5; M <- 4; set.seed(1)
ydata <- data.frame(round(matrix(runif(nn * M, max = 100), nn, M)))  # Integer counts
colnames(ydata) <- paste("y", 1:M, sep = "")

fit <- vglm(cbind(y1, y2, y3, y4) ~ 1, dirmultinomial,
            data = ydata, trace = TRUE)
head(fitted(fit))
depvar(fit)  # Sample proportions
weights(fit, type = "prior", matrix = FALSE)  # Total counts per row

## Not run: 
ydata <- transform(ydata, x2 = runif(nn))
fit <- vglm(cbind(y1, y2, y3, y4) ~ x2, dirmultinomial,
            data = ydata, trace = TRUE)
Coef(fit)
coef(fit, matrix = TRUE)
(sfit <- summary(fit))
vcov(sfit)

## End(Not run)

VGAM

Vector Generalized Linear and Additive Models

v1.1-5

GPL-3

Authors

Thomas Yee [aut, cre], Cleve Moler [ctb] (author of several LINPACK routines)

Initial release

2021-01-13

dirmultinomial

Description

Usage

Arguments

Details

Value

Warning

Note

Author(s)

References

See Also

Examples

VGAM

We don't support your browser anymore