VGAM: rrvglm – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

rrvglm

Fitting Reduced-Rank Vector Generalized Linear Models (RR-VGLMs)

Description

A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. In this documentation, M is the number of linear predictors.

Usage

rrvglm(formula, family = stop("argument 'family' needs to be assigned"),
       data = list(), weights = NULL, subset = NULL,
       na.action = na.fail, etastart = NULL, mustart = NULL,
       coefstart = NULL, control = rrvglm.control(...), offset = NULL,
       method = "rrvglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE,
       contrasts = NULL, constraints = NULL, extra = NULL,
       qr.arg = FALSE, smart = TRUE, ...)

Arguments

`formula, family, weights`	See `vglm`.
`data`	an optional data frame containing the variables in the model. By default the variables are taken from `environment(formula)`, typically the environment from which `rrvglm` is called.
`subset, na.action`	See `vglm`.
`etastart, mustart, coefstart`	See `vglm`.
`control`	a list of parameters for controlling the fitting process. See `rrvglm.control` for details.
`offset, model, contrasts`	See `vglm`.
`method`	the method to be used in fitting the model. The default (and presently only) method `rrvglm.fit` uses iteratively reweighted least squares (IRLS).
`x.arg, y.arg`	logical values indicating whether the model matrix and response vector/matrix used in the fitting process should be assigned in the `x` and `y` slots. Note the model matrix is the LM model matrix; to get the VGLM model matrix type `model.matrix(vglmfit)` where `vglmfit` is a `vglm` object.
`constraints`	See `vglm`.
`extra, smart, qr.arg`	See `vglm`.
`...`	further arguments passed into `rrvglm.control`.

Details

The central formula is given by

eta = B_1^T x_1 + A nu

where x1 is a vector (usually just a 1 for an intercept), x2 is another vector of explanatory variables, and nu = C^T x_2 is an R-vector of latent variables. Here, eta is a vector of linear predictors, e.g., the mth element is eta_m = log(E[Y_m]) for the mth Poisson response. The matrices B_1, A and C are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.

For identifiability it is common to enforce corner constraints on A: by default, the top R by R submatrix is fixed to be the order-R identity matrix and the remainder of A is estimated.

The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).

In theory, any VGAM family function that works for vglm and vgam should work for rrvglm too. The function that actually does the work is rrvglm.fit; it is vglm.fit with some extra code.

Value

An object of class "rrvglm", which has the the same slots as a "vglm" object. The only difference is that the some of the constraint matrices are estimates rather than known. But VGAM stores the models the same internally. The slots of "vglm" objects are described in vglm-class.

Note

The arguments of rrvglm are in general the same as those of vglm but with some extras in rrvglm.control.

The smart prediction (smartpred) library is packed with the VGAM library.

In an example below, a rank-1 stereotype model of Anderson (1984) is fitted to some car data. The reduced-rank regression is performed, adjusting for two covariates. Setting a trivial constraint matrix (diag(M)) for the latent variable variables in x2 avoids a warning message when it is overwritten by a (common) estimated constraint matrix. It shows that German cars tend to be more expensive than American cars, given a car of fixed weight and width.

If fit <- rrvglm(..., data = mydata) then summary(fit) requires corner constraints and no missing values in mydata. Often the estimated variance-covariance matrix of the parameters is not positive-definite; if this occurs, try refitting the model with a different value for Index.corner.

For constrained quadratic ordination (CQO) see cqo for more details about QRR-VGLMs.

With multiple binary responses, one must use binomialff(multiple.responses = TRUE) to indicate that the response is a matrix with one response per column. Otherwise, it is interpreted as a single binary response variable.

Author(s)

Thomas W. Yee

References

Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.

Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.

Anderson, J. A. (1984). Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.

Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.

Examples

## Not run: 
# Example 1: RR negative binomial with Var(Y) = mu + delta1 * mu^delta2
nn <- 1000       # Number of observations
delta1 <- 3.0    # Specify this
delta2 <- 1.5    # Specify this; should be greater than unity
a21 <- 2 - delta2
mydata <- data.frame(x2 = runif(nn), x3 = runif(nn))
mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3))
mydata <- transform(mydata,
                    y2 = rnbinom(nn, mu = mu, size = (1/delta1)*mu^a21))
plot(y2 ~ x2, data = mydata, pch = "+", col = 'blue', las = 1,
     main = paste("Var(Y) = mu + ", delta1, " * mu^", delta2, sep = ""))
rrnb2 <- rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL),
                data = mydata, trace = TRUE)

a21.hat <- (Coef(rrnb2)@A)["loglink(size)", 1]
beta11.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(mu)"]
beta21.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(size)"]
(delta1.hat <- exp(a21.hat * beta11.hat - beta21.hat))
(delta2.hat <- 2 - a21.hat)
# exp(a21.hat * predict(rrnb2)[1,1] - predict(rrnb2)[1,2])  # delta1.hat
summary(rrnb2)

# Obtain a 95 percent confidence interval for delta2:
se.a21.hat <- sqrt(vcov(rrnb2)["I(latvar.mat)", "I(latvar.mat)"])
ci.a21 <- a21.hat +  c(-1, 1) * 1.96 * se.a21.hat
(ci.delta2 <- 2 - rev(ci.a21))  # The 95 percent confidence interval

Confint.rrnb(rrnb2)  # Quick way to get it

# Plot the abundances and fitted values against the latent variable
plot(y2 ~ latvar(rrnb2), data = mydata, col = "blue",
     xlab = "Latent variable", las = 1)
ooo <- order(latvar(rrnb2))
lines(fitted(rrnb2)[ooo] ~ latvar(rrnb2)[ooo], col = "orange")

# Example 2: stereotype model (reduced-rank multinomial logit model)
data(car.all)
scar <- subset(car.all,
               is.element(Country, c("Germany", "USA", "Japan", "Korea")))
fcols <- c(13,14,18:20,22:26,29:31,33,34,36)  # These are factors
scar[, -fcols] <- scale(scar[, -fcols])  # Standardize all numerical vars
ones <- matrix(1, 3, 1)
clist <- list("(Intercept)" = diag(3), Width = ones, Weight = ones,
              Disp. = diag(3), Tank = diag(3), Price = diag(3),
              Frt.Leg.Room = diag(3))
set.seed(111)
fit <- rrvglm(Country ~ Width + Weight + Disp. + Tank +
              Price + Frt.Leg.Room,
              multinomial, data = scar, Rank = 2, trace = TRUE,
              constraints = clist, noRRR = ~ 1 + Width + Weight,
              Uncor = TRUE, Corner = FALSE, Bestof = 2)
fit@misc$deviance  # A history of the fits
Coef(fit)
biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2,
       ccol = "blue", scol = "orange", Ccol = "darkgreen", Clwd = 2,
       main = "1=Germany, 2=Japan, 3=Korea, 4=USA")

## End(Not run)

VGAM

Vector Generalized Linear and Additive Models

v1.1-5

GPL-3

Authors

Thomas Yee [aut, cre], Cleve Moler [ctb] (author of several LINPACK routines)

Initial release

2021-01-13