Vector Generalized Linear and Additive Models and Associated Models
VGAM provides functions for fitting vector generalized linear and additive models (VGLMs and VGAMs), and associated models (Reduced-rank VGLMs, Quadratic RR-VGLMs, Reduced-rank VGAMs). This package fits many models and distributions by maximum likelihood estimation (MLE) or penalized MLE. Also fits constrained ordination models in ecology such as constrained quadratic ordination (CQO).
This package centers on the
iteratively reweighted least squares (IRLS)
algorithm.
Other key words include
Fisher scoring,
additive models,
reduced-rank regression,
penalized likelihood,
and constrained ordination.
The central modelling functions are
vglm
,
vgam
,
rrvglm
,
rcim
,
cqo
,
cao
.
Function
vglm
operates very similarly to
glm
but is much more general,
and many methods functions
such as coef
and
predict
are available.
The package uses S4 (see methods-package
).
Some companion packages:
(1) VGAMdata contains data sets useful for
illustrating VGAM.
Some of the big ones were initially from VGAM.
Recently, some older VGAM family functions have been shifted
into VGAMdata too.
(2) VGAMextra written by Victor Miranda has some additional
VGAM family and link functions,
with a bent towards time series models.
(3) svyVGAM provides design-based inference,
e.g., to survey sampling settings.
This is because the weights
argument of
vglm
can be assigned any positive values including
survey weights.
Compared to other similar packages, such as gamlss and mgcv, VGAM has more models implemented (150+ of them) and they are not restricted to a location-scale-shape framework or (largely) the 1-parameter exponential family. There is a general statistical framework behind it all, that once grasped, makes regression modelling quite unified. Some features of the package are: (i) most family functions handle multiple responses; (ii) reduced-rank regression is available by operating on latent variables (optimal linear combinations of the explanatory variables); (iii) basic automatic smoothing parameter selection is implemented for VGAMs, although it has to be refined; (iv) smart prediction allows correct prediction of nested terms in the formula provided smart functions are used.
The GLM and GAM classes are special cases of VGLMs and VGAMs. The VGLM/VGAM framework is intended to be very general so that it encompasses as many distributions and models as possible. VGLMs are limited only by the assumption that the regression coefficients enter through a set of linear predictors. The VGLM class is very large and encompasses a wide range of multivariate response types and models, e.g., it includes univariate and multivariate distributions, categorical data analysis, extreme values, correlated binary data, quantile and expectile regression, time series problems. Potentially, it can handle generalized estimating equations, survival analysis, bioassay data and nonlinear least-squares problems.
Crudely, VGAMs are to VGLMs what GAMs are to GLMs.
Two types of VGAMs are implemented:
1st-generation VGAMs with s
use vector backfitting,
while
2nd-generation VGAMs with sm.os
and
sm.ps
use O-splines and P-splines,
do not use the backfitting algorithm,
and have automatic smoothing parameter selection.
The former is older and is based on Yee and Wild (1996).
The latter is more modern
(Yee, Somchit and Wild, 2020)
but it requires a reasonably large number of observations
to work well.
This package is the first to check for the Hauck-Donner effect
(HDE) in regression models; see hdeff
. This is an
aberration of the Wald statistics when the parameter estimates are too
close to the boundary of the parameter space. When present the p-value
of a regression coefficient is biased upwards so that a highly
significant variable might be deemed nonsignificant. Thus the HDE can
create havoc for variable selection!
Somewhat related to the previous paragraph, hypothesis testing
using the likelihood ratio test,
Rao's score test (Lagrange multiplier test) and
(modified) Wald's test are all available; see summaryvglm
.
For all regression coefficients of a model, taken one at a time,
all three methods require further IRLS iterations to obtain
new values of the other regression coefficients after one
of the coefficients has had its value set (usually to 0).
Hence the computation load is overall significant.
For a complete list of this package, use library(help = "VGAM")
.
New VGAM family functions are continually being written and
added to the package.
This package is undergoing continual development and improvement,
therefore users should treat everything as subject to change.
This includes the
family function names,
argument names,
many of the internals,
the use of link functions,
and slot names.
For example, all link functions may be renamed so that they
end in "link"
,
e.g., loglink()
instead of loglink()
.
Some future pain can be avoided by using good programming
techniques, e.g., using extractor/accessor functions such as
coef()
, weights()
, vcov()
,
predict()
.
Nevertheless, please expect changes in all aspects of the package.
See the NEWS
file for a list of changes from version to
version.
Thomas W. Yee, t.yee@auckland.ac.nz.
Maintainer: Thomas Yee t.yee@auckland.ac.nz.
Yee, T. W. (2015). Vector Generalized Linear and Additive Models: With an Implementation in R. New York, USA: Springer.
Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. and Stephenson, A. G. (2007) Vector generalized linear and additive extreme value models. Extremes, 10, 1–19.
Yee, T. W. and Wild, C. J. (1996) Vector generalized additive models. Journal of the Royal Statistical Society, Series B, Methodological, 58, 481–493.
Yee, T. W. (2004) A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Yee, T. W. (2006) Constrained additive ordination. Ecology, 87, 203–213.
Yee, T. W. (2008)
The VGAM
Package.
R News, 8, 28–39.
Yee, T. W. (2010) The VGAM package for categorical data analysis. Journal of Statistical Software, 32, 1–34. https://www.jstatsoft.org/v32/i10/.
Yee, T. W. (2014) Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
Yee, T. W. (2018) On the Hauck-Donner effect in Wald tests: Detection, and parameter space characterization (under review).
Yee, T. W. and Somchit, C. and Wild, C. J. (2019) Penalized vector generalized additive models. Manuscript in preparation.
My website for the VGAM package and book is at https://www.stat.auckland.ac.nz/~yee/. There are some resources there, especially as relating to my book and new features added to VGAM.
# Example 1; proportional odds model pneumo <- transform(pneumo, let = log(exposure.time)) (fit1 <- vglm(cbind(normal, mild, severe) ~ let, propodds, data = pneumo)) depvar(fit1) # Better than using fit1@y; dependent variable (response) weights(fit1, type = "prior") # Number of observations coef(fit1, matrix = TRUE) # p.179, in McCullagh and Nelder (1989) constraints(fit1) # Constraint matrices summary(fit1) # HDE could affect these results summary(fit1, lrt0 = TRUE, score0 = TRUE, wald0 = TRUE) # No HDE hdeff(fit1) # Check for any Hauck-Donner effect # Example 2; zero-inflated Poisson model zdata <- data.frame(x2 = runif(nn <- 2000)) zdata <- transform(zdata, pstr0 = logitlink(-0.5 + 1*x2, inverse = TRUE), lambda = loglink( 0.5 + 2*x2, inverse = TRUE)) zdata <- transform(zdata, y = rzipois(nn, lambda, pstr0 = pstr0)) with(zdata, table(y)) fit2 <- vglm(y ~ x2, zipoisson, data = zdata, trace = TRUE) coef(fit2, matrix = TRUE) # These should agree with the above values # Example 3; fit a two species GAM simultaneously fit3 <- vgam(cbind(agaaus, kniexc) ~ s(altitude, df = c(2, 3)), binomialff(multiple.responses = TRUE), data = hunua) coef(fit3, matrix = TRUE) # Not really interpretable ## Not run: plot(fit3, se = TRUE, overlay = TRUE, lcol = 3:4, scol = 3:4) ooo <- with(hunua, order(altitude)) with(hunua, matplot(altitude[ooo], fitted(fit3)[ooo, ], type = "l", lwd = 2, col = 3:4, xlab = "Altitude (m)", ylab = "Probability of presence", las = 1, main = "Two plant species' response curves", ylim = c(0, 0.8))) with(hunua, rug(altitude)) ## End(Not run) # Example 4; LMS quantile regression fit4 <- vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), data = bmi.nz, trace = TRUE) head(predict(fit4)) head(fitted(fit4)) head(bmi.nz) # Person 1 is near the lower quartile among people his age head(cdf(fit4)) ## Not run: par(mfrow = c(1,1), bty = "l", mar = c(5,4,4,3)+0.1, xpd=TRUE) qtplot(fit4, percentiles = c(5,50,90,99), main = "Quantiles", las = 1, xlim = c(15, 90), ylab = "BMI", lwd=2, lcol=4) # Quantile plot ygrid <- seq(15, 43, len = 100) # BMI ranges par(mfrow = c(1, 1), lwd = 2) # Density plot aa <- deplot(fit4, x0 = 20, y = ygrid, xlab = "BMI", col = "black", main = "Density functions at Age=20 (black), 42 (red) and 55 (blue)") aa aa <- deplot(fit4, x0 = 42, y = ygrid, add = TRUE, llty = 2, col = "red") aa <- deplot(fit4, x0 = 55, y = ygrid, add = TRUE, llty = 4, col = "blue", Attach = TRUE) aa@post$deplot # Contains density function values ## End(Not run) # Example 5; GEV distribution for extremes (fit5 <- vglm(maxtemp ~ 1, gevff, data = oxtemp, trace = TRUE)) head(fitted(fit5)) coef(fit5, matrix = TRUE) Coef(fit5) vcov(fit5) vcov(fit5, untransform = TRUE) sqrt(diag(vcov(fit5))) # Approximate standard errors ## Not run: rlplot(fit5)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.