Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dirmul.old

Fitting a Dirichlet-Multinomial Distribution


Description

Fits a Dirichlet-multinomial distribution to a matrix of non-negative integers.

Usage

dirmul.old(link = "loglink", ialpha = 0.01, parallel = FALSE, zero = NULL)

Arguments

link

Link function applied to each of the M (positive) shape parameters alpha_j for j=1,…,M. See Links for more choices. Here, M is the number of columns of the response matrix.

ialpha

Numeric vector. Initial values for the alpha vector. Must be positive. Recycled to length M.

parallel

A logical, or formula specifying which terms have equal/unequal coefficients.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,...,M}.

Details

The Dirichlet-multinomial distribution, which is somewhat similar to a Dirichlet distribution, has probability function

P(Y_1=y_1,…,Y_M=y_M) = C_{y_1,…,y_M}^{2y_{*}} Gamma(alpha_+) / Gamma( 2y_* + alpha_+) prod_{j=1}^M [ Gamma( y_j+ alpha_j) / Gamma( alpha_j)]

for alpha_j > 0, alpha_+ = alpha_1 + \cdots + alpha_M, and 2y_* = y_1 + \cdots + y_M. Here, C_b^a means “a choose b” and refers to combinations (see choose). The (posterior) mean is

E(Y_j) = (y_j + alpha_j) / (2y_{*} + alpha_+)

for j=1,…,M, and these are returned as the fitted values as a M-column matrix.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Note

The response should be a matrix of non-negative values. Convergence seems to slow down if there are zero values. Currently, initial values can be improved upon.

This function is almost defunct and may be withdrawn soon. Use dirmultinomial instead.

Author(s)

Thomas W. Yee

References

Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

Paul, S. R., Balasooriya, U. and Banerjee, T. (2005). Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230–236.

Tvedebrink, T. (2010). Overdispersion in allelic counts and θ-correction in forensic genetics. Theoretical Population Biology, 78, 200–210.

See Also

Examples

# Data from p.50 of Lange (2002)
alleleCounts <- c(2, 84, 59, 41, 53, 131, 2, 0,
       0, 50, 137, 78, 54, 51, 0, 0,
       0, 80, 128, 26, 55, 95, 0, 0,
       0, 16, 40, 8, 68, 14, 7, 1)
dim(alleleCounts) <- c(8, 4)
alleleCounts <- data.frame(t(alleleCounts))
dimnames(alleleCounts) <- list(c("White","Black","Chicano","Asian"),
                    paste("Allele", 5:12, sep = ""))

set.seed(123)  # @initialize uses random numbers
fit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
                  Allele10,Allele11,Allele12) ~ 1, dirmul.old,
             trace = TRUE, crit = "c", data = alleleCounts)

(sfit <- summary(fit))
vcov(sfit)
round(eta2theta(coef(fit), fit@misc$link, fit@misc$earg), digits = 2)  # not preferred
round(Coef(fit), digits = 2)  # preferred
round(t(fitted(fit)), digits = 4)  # 2nd row of Table 3.5 of Lange (2002)
coef(fit, matrix = TRUE)


pfit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
                   Allele10,Allele11,Allele12) ~ 1,
             dirmul.old(parallel = TRUE), trace = TRUE,
             data = alleleCounts)
round(eta2theta(coef(pfit, matrix = TRUE), pfit@misc$link,
                pfit@misc$earg), digits = 2)  # 'Right' answer
round(Coef(pfit), digits = 2)  # 'Wrong' answer due to parallelism constraint

VGAM

Vector Generalized Linear and Additive Models

v1.1-5
GPL-3
Authors
Thomas Yee [aut, cre], Cleve Moler [ctb] (author of several LINPACK routines)
Initial release
2021-01-13

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.