Fitting Reduced-Rank Vector Generalized Linear Models (RR-VGLMs)
A reduced-rank vector generalized linear model (RR-VGLM) is fitted. RR-VGLMs are VGLMs but some of the constraint matrices are estimated. In this documentation, M is the number of linear predictors.
rrvglm(formula, family = stop("argument 'family' needs to be assigned"), data = list(), weights = NULL, subset = NULL, na.action = na.fail, etastart = NULL, mustart = NULL, coefstart = NULL, control = rrvglm.control(...), offset = NULL, method = "rrvglm.fit", model = FALSE, x.arg = TRUE, y.arg = TRUE, contrasts = NULL, constraints = NULL, extra = NULL, qr.arg = FALSE, smart = TRUE, ...)
formula, family, weights |
See |
data |
an optional data frame containing the variables in the model.
By default the variables are taken from |
subset, na.action |
See |
etastart, mustart, coefstart |
See |
control |
a list of parameters for controlling the fitting process.
See |
offset, model, contrasts |
See |
method |
the method to be used in fitting the model.
The default (and presently only) method |
x.arg, y.arg |
logical values indicating whether
the model matrix and response vector/matrix used in the fitting
process should be assigned in the |
constraints |
See |
extra, smart, qr.arg |
See |
... |
further arguments passed into |
The central formula is given by
eta = B_1^T x_1 + A nu
where x1 is a vector (usually just a 1 for an intercept), x2 is another vector of explanatory variables, and nu = C^T x_2 is an R-vector of latent variables. Here, eta is a vector of linear predictors, e.g., the mth element is eta_m = log(E[Y_m]) for the mth Poisson response. The matrices B_1, A and C are estimated from the data, i.e., contain the regression coefficients. For ecologists, the central formula represents a constrained linear ordination (CLO) since it is linear in the latent variables. It means that the response is a monotonically increasing or decreasing function of the latent variables.
For identifiability it is common to enforce corner constraints on A: by default, the top R by R submatrix is fixed to be the order-R identity matrix and the remainder of A is estimated.
The underlying algorithm of RR-VGLMs is iteratively reweighted least squares (IRLS) with an optimizing algorithm applied within each IRLS iteration (e.g., alternating algorithm).
An object of class "rrvglm"
, which has the the same
slots as a "vglm"
object. The only difference is
that the some of the constraint matrices are estimates
rather than known. But VGAM stores the models the
same internally. The slots of "vglm"
objects are
described in vglm-class
.
The arguments of rrvglm
are in general the same as
those of vglm
but with some extras in
rrvglm.control
.
The smart prediction (smartpred
) library
is packed with the VGAM library.
In an example below, a rank-1 stereotype
model of Anderson (1984) is fitted to some car data.
The reduced-rank regression is performed, adjusting for
two covariates. Setting a trivial constraint matrix
(diag(M)
)
for the latent variable variables in x2 avoids
a warning message when it is overwritten by a (common)
estimated constraint matrix. It shows that German cars
tend to be more expensive than American cars, given a
car of fixed weight and width.
If fit <- rrvglm(..., data = mydata)
then
summary(fit)
requires corner constraints and no
missing values in mydata
. Often the estimated
variance-covariance matrix of the parameters is not
positive-definite; if this occurs, try refitting the
model with a different value for Index.corner
.
For constrained quadratic ordination (CQO) see
cqo
for more details about QRR-VGLMs.
With multiple binary responses, one must use
binomialff(multiple.responses = TRUE)
to indicate
that the response is a matrix with one response per column.
Otherwise, it is interpreted as a single binary response
variable.
Thomas W. Yee
Yee, T. W. and Hastie, T. J. (2003). Reduced-rank vector generalized linear models. Statistical Modelling, 3, 15–41.
Yee, T. W. (2004). A new technique for maximum-likelihood canonical Gaussian ordination. Ecological Monographs, 74, 685–701.
Anderson, J. A. (1984). Regression and ordered categorical variables. Journal of the Royal Statistical Society, Series B, Methodological, 46, 1–30.
Yee, T. W. (2014). Reduced-rank vector generalized linear models with two linear predictors. Computational Statistics and Data Analysis, 71, 889–902.
rrvglm.control
,
lvplot.rrvglm
(same as biplot.rrvglm
),
rrvglm-class
,
grc
,
cqo
,
vglmff-class
,
vglm
,
vglm-class
,
smartpred
,
rrvglm.fit
.
Special family functions include
negbinomial
zipoisson
and zinegbinomial
.
(see Yee (2014) and COZIGAM).
Methods functions include
Coef.rrvglm
,
calibrate.rrvglm
,
summary.rrvglm
,
etc.
Data include
crashi
.
## Not run: # Example 1: RR negative binomial with Var(Y) = mu + delta1 * mu^delta2 nn <- 1000 # Number of observations delta1 <- 3.0 # Specify this delta2 <- 1.5 # Specify this; should be greater than unity a21 <- 2 - delta2 mydata <- data.frame(x2 = runif(nn), x3 = runif(nn)) mydata <- transform(mydata, mu = exp(2 + 3 * x2 + 0 * x3)) mydata <- transform(mydata, y2 = rnbinom(nn, mu = mu, size = (1/delta1)*mu^a21)) plot(y2 ~ x2, data = mydata, pch = "+", col = 'blue', las = 1, main = paste("Var(Y) = mu + ", delta1, " * mu^", delta2, sep = "")) rrnb2 <- rrvglm(y2 ~ x2 + x3, negbinomial(zero = NULL), data = mydata, trace = TRUE) a21.hat <- (Coef(rrnb2)@A)["loglink(size)", 1] beta11.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(mu)"] beta21.hat <- Coef(rrnb2)@B1["(Intercept)", "loglink(size)"] (delta1.hat <- exp(a21.hat * beta11.hat - beta21.hat)) (delta2.hat <- 2 - a21.hat) # exp(a21.hat * predict(rrnb2)[1,1] - predict(rrnb2)[1,2]) # delta1.hat summary(rrnb2) # Obtain a 95 percent confidence interval for delta2: se.a21.hat <- sqrt(vcov(rrnb2)["I(latvar.mat)", "I(latvar.mat)"]) ci.a21 <- a21.hat + c(-1, 1) * 1.96 * se.a21.hat (ci.delta2 <- 2 - rev(ci.a21)) # The 95 percent confidence interval Confint.rrnb(rrnb2) # Quick way to get it # Plot the abundances and fitted values against the latent variable plot(y2 ~ latvar(rrnb2), data = mydata, col = "blue", xlab = "Latent variable", las = 1) ooo <- order(latvar(rrnb2)) lines(fitted(rrnb2)[ooo] ~ latvar(rrnb2)[ooo], col = "orange") # Example 2: stereotype model (reduced-rank multinomial logit model) data(car.all) scar <- subset(car.all, is.element(Country, c("Germany", "USA", "Japan", "Korea"))) fcols <- c(13,14,18:20,22:26,29:31,33,34,36) # These are factors scar[, -fcols] <- scale(scar[, -fcols]) # Standardize all numerical vars ones <- matrix(1, 3, 1) clist <- list("(Intercept)" = diag(3), Width = ones, Weight = ones, Disp. = diag(3), Tank = diag(3), Price = diag(3), Frt.Leg.Room = diag(3)) set.seed(111) fit <- rrvglm(Country ~ Width + Weight + Disp. + Tank + Price + Frt.Leg.Room, multinomial, data = scar, Rank = 2, trace = TRUE, constraints = clist, noRRR = ~ 1 + Width + Weight, Uncor = TRUE, Corner = FALSE, Bestof = 2) fit@misc$deviance # A history of the fits Coef(fit) biplot(fit, chull = TRUE, scores = TRUE, clty = 2, Ccex = 2, ccol = "blue", scol = "orange", Ccol = "darkgreen", Clwd = 2, main = "1=Germany, 2=Japan, 3=Korea, 4=USA") ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.