Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

gllvm

Generalized Linear Latent Variable Models


Description

Fits generalized linear latent variable model for multivariate data. The model can be fitted using Laplace approximation method or variational approximation method.

Usage

gllvm(
  y = NULL,
  X = NULL,
  TR = NULL,
  data = NULL,
  formula = NULL,
  num.lv = 2,
  family,
  row.eff = FALSE,
  offset = NULL,
  quadratic = FALSE,
  sd.errors = TRUE,
  method = "VA",
  randomX = NULL,
  dependent.row = FALSE,
  beta0com = FALSE,
  zeta.struc = "species",
  plot = FALSE,
  link = "probit",
  Power = 1.1,
  seed = NULL,
  scale.X = TRUE,
  return.terms = TRUE,
  gradient.check = FALSE,
  control = list(reltol = 1e-10, TMB = TRUE, optimizer = "optim", max.iter = 200, maxit
    = 4000, trace = FALSE, optim.method = NULL),
  control.va = list(Lambda.struc = "unstructured", Ab.struct = "unstructured",
    diag.iter = 1, Ab.diag.iter = 0, Lambda.start = c(0.3, 0.3, 0.3)),
  control.start = list(starting.val = "res", n.init = 1, jitter.var = 0, start.fit =
    NULL, start.lvs = NULL, randomX.start = "zero", quad.start = 0.01, start.struc =
    "LV"),
  ...
)

Arguments

y

(n x m) matrix of responses.

X

matrix or data.frame of environmental covariates.

TR

matrix or data.frame of trait covariates.

data

data in long format, that is, matrix of responses, environmental and trait covariates and row index named as "id". When used, model needs to be defined using formula. This is alternative data input for y, X and TR.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.

num.lv

number of latent variables, d, in gllvm model. Non-negative integer, less than number of response variables (m). Defaults to 2.

family

distribution function for responses. Options are poisson(link = "log"), "negative.binomial" (with log link), binomial(link = "probit") (and also binomial(link = "logit") when method = "LA"), zero inflated poisson ("ZIP"), gaussian(link = "identity"), "gamma" (with log link), "exponential" (with log link), Tweedie ("tweedie") (with log link, for "LA" and "EVA"-method), beta ("beta") (with logit and probit link, for "LA" and "EVA"-method) and "ordinal" (only with "VA"-method).

row.eff

FALSE, fixed or "random", Indicating whether row effects are included in the model as a fixed or as a random effects. Defaults to FALSE when row effects are not included.

offset

vector or matrix of offset terms.

quadratic

either FALSE(default), TRUE, or LV. If FALSE models species responses as a linear function of the latent variables. If TRUE models species responses as a quadratic function of the latent variables. If LV assumes species all have the same quadratic coefficient per latent variable.

sd.errors

logical. If TRUE (default) standard errors for parameter estimates are calculated.

method

model can be fitted using Laplace approximation method (method = "LA") or variational approximation method (method = "VA"), or with extended variational approximation method (method = "EVA") when VA is not applicable. If particular model has not been implemented using the selected method, model is fitted using the alternative method as a default. Defaults to "VA".

randomX

formula for species specific random effects of environmental variables in fourth corner model. Defaults to NULL, when random slopes are not included.

dependent.row

logical, whether or not random row effects are correlated (dependent) with the latent variables. Defaults to FALSE when correlation terms are not included.

beta0com

logical, if FALSE column-specific intercepts are assumed. If TRUE, a common intercept is used which is allowed only for fourth corner models.

zeta.struc

Structure for cut-offs in the ordinal model. Either "common", for the same cut-offs for all species, or "species" for species-specific cut-offs. For the latter, classes are arbitrary per species, each category per species needs to have at least one observations. Defaults to "species".

plot

logical, if TRUE ordination plots will be printed in each iteration step when TMB = FALSE. Defaults to FALSE.

link

link function for binomial family if method = "LA" and beta family. Options are "logit" and "probit.

Power

fixed power parameter in Tweedie model. Scalar from interval (1,2). Defaults to 1.1.

seed

a single seed value, defaults to NULL.

scale.X

if TRUE, covariates are scaled when fourth corner model is fitted.

return.terms

logical, if TRUE 'terms' object is returned.

gradient.check

logical, if TRUE gradients are checked for large values (>0.01) even if the optimization algorithm did converge.

control

A list with the following arguments controlling the optimization:

  • reltol: convergence criteria for log-likelihood, defaults to 1e-8.

  • TMB: logical, if TRUE model will be fitted using Template Model Builder (TMB). TMB is always used if method = "LA". Defaults to TRUE.

  • optimizer: if TMB=TRUE, log-likelihood can be optimized using "optim" (default) or "nlminb".

  • max.iter: maximum number of iterations when TMB = FALSE or for optimizer = "nlminb" when TMB = TRUE, defaults to 200.

  • maxit: maximum number of iterations for optimizer, defaults to 4000.

  • trace: logical, if TRUE in each iteration step information on current step will be printed. Defaults to FALSE. Only with TMB = FALSE.

  • optim.method: optimization method to be used if optimizer is "optim". Defaults to "BFGS", and "L-BFGS-B" to Tweedie family due the limited-memory use.

control.va

A list with the following arguments controlling the variational approximation method:

  • Lambda.struc: covariance structure of VA distributions for latent variables when method = "VA", "unstructured" or "diagonal".

  • Ab.struct: covariance structure of VA distributions for random slopes when method = "VA", "unstructured" or "diagonal".

  • diag.iter: non-negative integer which can sometimes be used to speed up the updating of variational (covariance) parameters in VA method. Can sometimes improve the accuracy. If TMB = TRUE either 0 or 1. Defaults to 1.

  • Ab.diag.iter: As above, but for variational covariance of random slopes.

  • Lambda.start: starting values for variances in VA distributions for latent variables, random row effects and random slopes in variational approximation method. Defaults to 0.2.

control.start

A list with the following arguments controlling the starting values:

  • starting.val: starting values can be generated by fitting model without latent variables, and applying factorial analysis to residuals to get starting values for latent variables and their coefficients (starting.val = "res"). Another options are to use zeros as a starting values (starting.val = "zero") or initialize starting values for latent variables with (n x num.lv) matrix. Defaults to "res", which is recommended.

  • n.init: number of initial runs. Uses multiple runs and picks up the one giving highest log-likelihood value. Defaults to 1.

  • start.fit: object of class 'gllvm' which can be given as starting parameters for count data (poisson, NB, or ZIP).

  • start.lvs: initialize starting values for latent variables with (n x num.lv) matrix. Defaults to NULL.

  • jitter.var: jitter variance for starting values of latent variables. Defaults to 0, meaning no jittering.

  • randomX.start: Starting value method for the random slopes. Options are "zero" and "res". Defaults to "zero".

  • start.struc: Starting value method for the quadratic term. Options are "LV"(default) and "all".

  • quad.start: Starting values for quadratic coefficients. Defaults to 0.01.

...

Not used.

Details

Fits generalized linear latent variable models as in Hui et al. (2015 and 2017) and Niku et al. (2017). Method can be used with two types of latent variable models depending on covariates. If only site related environmental covariates are used, the expectation of response Y_{ij} is determined by

g(μ_{ij}) = η_{ij} = α_i + β_{0j} + x_i'β_j + u_i'θ_j,

where g(.) is a known link function, u_i are d-variate latent variables (d<<m), α_i is an optional row effect at site i, and it can be fixed or random effect, β_{0j} is an intercept term for species j, β_j and θ_j are column specific coefficients related to covariates and the latent variables, respectively.

Alternatively, a more complex version of the model can be fitted with quadratic = TRUE, where species are modeled as a quadratic function of the latent variables:

g(μ_{ij}) = η_{ij} = α_i + β_{0j} + x_i'β_j + u_i'θ_j - u_i' D_j u_i

. Here, D_j is a diagonal matrix of positive only quadratic coefficients, so that the model generates concave shapes only. This implementation follows the ecological theoretical model where species are generally recognized to exhibit non-linear response curves. For a model with quadratic responses, quadratic coefficients are assumed to be the same for all species:

D_j = D

. This model requires less information per species and can be expected to be more applicable to most datasets. The quadratic coefficients D can be used to calculate the length of ecological gradients.

An alternative model is the fourth corner model (Brown et al., 2014, Warton et al., 2015) which will be fitted if also trait covariates are included. The expectation of response Y_{ij} is

g(μ_{ij}) = α_i + β_{0j} + x_i'(β_x + b_j) + TR_j'β_t + vec(B)*kronecker(TR_j,X_i) + u_i'θ_j - u_i'D_ju_i

where g(.), u_i, β_{0j} and θ_j are defined as above. Vectors β_x and β_t are the main effects or coefficients related to environmental and trait covariates, respectively, matrix B includes interaction terms. Vectors b_j are optional species-specific random slopes for environmental covariates. The interaction/fourth corner terms are optional as well as are the main effects of trait covariates.

The method is sensitive for the choices of initial values of the latent variables. Therefore it is recommendable to use multiple runs and pick up the one giving the highest log-likelihood value. However, sometimes this is computationally too demanding, and default option starting.val = "res" is recommended. For more details on different starting value methods, see Niku et al., (2018).

For quadratic responses, it can be useful to provide the latent variables estimated with a GLLVM with linear responses, or estimated with (Detrended) Correspondence Analaysis. The latent variables can then be passed to the start.lvs argument inside the control.start list, which in many cases gives good results. For a GLLVM with quadratic responses and poisson distribution, it is recommended to fit GLLVM with linear responses and a negative binomial distribution, or using a Poisson distribution with random row-effects, instead. This is because the quadratic term can account for overdispersion in the Poisson case, which needs to be separately accounted for with linear responses. As a result, it should rarely be required to fit a GLLVM with quadratic responses and negative binomial distribution (or row-effects).

Models are implemented using TMB (Kristensen et al., 2015) applied to variational approximation (Hui et al., 2017) and Laplace approximation (Niku et al., 2017).

With ordinal family response classes must start from 0 or 1.

Distributions

Mean and variance for distributions are defined as follows.

  • For count data family = poisson(): Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}, or

  • family = "negative.binomial": Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}+μ_{ij}^2φ_j, or

  • family = "ZIP": Expectation E[Y_{ij}] = (1-p)μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1-p)(1+μ_{ij}p).

  • For binary data family = binomial(): Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1-μ_{ij}).

  • For percent cover data 0 < Y_{ij} < 1 family = "beta": Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}(1-μ_{ij})//(1+φ_j).

  • For positive continuous data family = "gamma":Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}^2/φ_j, where φ_j is species specific shape parameter.

  • For non-negative continuous data family = "exponential":Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = μ_{ij}^2.

  • For non-negative continuous or biomass datafamily = "tweedie" Expectation E[Y_{ij}] = μ_{ij}, variance V(μ_{ij}) = φ_j*μ_{ij}^ν, where ν is a power parameter of Tweedie distribution. See details Dunn and Smyth (2005).

  • For ordinal data family = "ordinal": Cumulative probit model, see Hui et.al. (2016).

  • For normal distributed data family = gaussian(): Expectation E[Y_{ij}] = μ_{ij}, variance V(y_{ij}) = φ_j^2.

Value

An object of class "gllvm" includes the following components:

call

function call

logL

log likelihood

lvs

latent variables

params

list of parameters

  • theta coefficients related to latent variables

  • beta0 column specific intercepts

  • Xcoef coefficients related to environmental covariates X

  • B coefficients in fourth corner model

  • row.params row-specific intercepts

  • phi dispersion parameters φ for negative binomial or Tweedie family, probability of zero inflation for ZIP family, standard deviation for gaussian family or shape parameter for gamma family

  • inv.phi dispersion parameters 1/φ for negative binomial

Power

power parameter ν for Tweedie family

sd

list of standard errors of parameters

prediction.errors

list of prediction covariances for latent variables and variances for random row effects when method "LA" is used

A, Ar

covariance matrices for variational densities of latent variables and variances for random row effects

Note

If function gives warning: 'In f(x, order = 0) : value out of range in 'lgamma”, optimizer have visited an area where gradients become too big. It is automatically fixed by trying another step in the optimization process, and can be ignored if errors do not occur.

Author(s)

Jenni Niku <jenni.m.e.niku@jyu.fi>, Wesley Brooks, Riki Herliansyah, Francis K.C. Hui, Sara Taskinen, David I. Warton, Bert van der Veen

References

Brown, A. M., Warton, D. I., Andrew, N. R., Binns, M., Cassis, G., and Gibb, H. (2014). The fourth-corner solution - using predictive models to understand how species traits interact with the environment. Methods in Ecology and Evolution, 5:344-352.

Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of tweedie exponential dispersion model densities. Statistics and Computing, 15:267-280.

Hui, F. K. C., Taskinen, S., Pledger, S., Foster, S. D., and Warton, D. I. (2015). Model-based approaches to unconstrained ordination. Methods in Ecology and Evolution, 6:399-411.

Hui, F. K. C., Warton, D., Ormerod, J., Haapaniemi, V., and Taskinen, S. (2017). Variational approximations for generalized linear latent variable models. Journal of Computational and Graphical Statistics. Journal of Computational and Graphical Statistics, 26:35-43.

Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21.

Niku, J., Warton, D. I., Hui, F. K. C., and Taskinen, S. (2017). Generalized linear latent variable models for multivariate count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics, 22:498-522.

Niku, J., Brooks, W., Herliansyah, R., Hui, F. K. C., Taskinen, S., and Warton, D. I. (2018). Efficient estimation of generalized linear latent variable models. PLoS One, 14(5):1-20.

Warton, D. I., Guillaume Blanchet, F., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C. and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology & Evolution, 30:766-779.

See Also

Examples

# Extract subset of the microbial data to be used as an example
data(microbialdata)
X <- microbialdata$Xenv
y <- microbialdata$Y[, order(colMeans(microbialdata$Y > 0), 
                     decreasing = TRUE)[21:40]]
fit <- gllvm(y, X, formula = ~ pH + Phosp, family = poisson())
fit$logL
ordiplot(fit)
coefplot(fit)


## Load a dataset from the mvabund package
library(mvabund)
data(antTraits)
y <- as.matrix(antTraits$abund)
X <- as.matrix(antTraits$env)
TR <- antTraits$traits
# Fit model with environmental covariates Bare.ground and Shrub.cover
fit <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
            family = poisson())
ordiplot(fit)
coefplot(fit)

## Example 1: Fit model with two latent variables
# Using variational approximation:
fitv0 <- gllvm(y, family = "negative.binomial", method = "VA")
ordiplot(fitv0)
plot(fitv0, mfrow = c(2,2))
summary(fitv0)
confint(fitv0)
# Using Laplace approximation: (this line may take about 30 sec to run)
fitl0 <- gllvm(y, family = "negative.binomial", method = "LA")
ordiplot(fitl0)

# Poisson family:
fit.p <- gllvm(y, family = poisson(), method = "LA")
ordiplot(fit.p)
# Use poisson model as a starting parameters for ZIP-model, this line may take few minutes to run
fit.z <- gllvm(y, family = "ZIP", method = "LA", control.start =list(start.fit = fit.p))
ordiplot(fit.z)


## Example 2: gllvm with environmental variables
# Fit model with two latent variables and all environmental covariates,
fitvX <- gllvm(formula = y ~ X, family = "negative.binomial")
ordiplot(fitvX, biplot = TRUE)
coefplot(fitvX)
# Fit model with environmental covariates Bare.ground and Shrub.cover
fitvX2 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial")
ordiplot(fitvX2)
coefplot(fitvX2)
# Use 5 initial runs and pick the best one
fitvX_5 <- gllvm(y, X, formula = ~ Bare.ground + Shrub.cover,
 family = "negative.binomial", control.start=list(n.init = 5, jitter.var = 0.1))
ordiplot(fitvX_5)
coefplot(fitvX_5)

## Example 3: Data in long format
# Reshape data to long format:
datalong <- reshape(data.frame(cbind(y,X)), direction = "long",
                   varying = colnames(y), v.names = "y")
head(datalong)
fitvLong <- gllvm(data = datalong, formula = y ~ Bare.ground + Shrub.cover,
               family = "negative.binomial")

## Example 4: Fourth corner model
# Fit fourth corner model with two latent variables
fitF1 <- gllvm(y = y, X = X, TR = TR, family = "negative.binomial")
coefplot(fitF1)
# Fourth corner can be plotted also with next lines
#fourth = fitF1$fourth.corner
#library(lattice)
#a = max( abs(fourth) )
#colort = colorRampPalette(c("blue","white","red"))
#plot.4th = levelplot(t(as.matrix(fourth)), xlab = "Environmental Variables",
#              ylab = "Species traits", col.regions = colort(100),
#              at = seq( -a, a, length = 100), scales = list( x = list(rot = 45)))
#print(plot.4th)

# Specify model using formula
fitF2 <- gllvm(y = y, X = X, TR = TR,
 formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
 family = "negative.binomial")
ordiplot(fitF2)
coefplot(fitF2)

## Include species specific random slopes to the fourth corner model
fitF3 <- gllvm(y = y, X = X, TR = TR,
 formula = ~ Bare.ground + Canopy.cover * (Pilosity + Webers.length),
 family = "negative.binomial", randomX = ~ Bare.ground + Canopy.cover, 
 control.start = list(n.init = 3))
ordiplot(fitF3)
coefplot(fitF3)


## Example 5: Fit Tweedie model
# Load coral data
data(tikus)
ycoral <- tikus$abund
# Let's consider only years 1981 and 1983
ycoral <- ycoral[((tikus$x$time == 81) + (tikus$x$time == 83)) > 0, ]
# Exclude species which have observed at less than 4 sites
ycoral <- ycoral[-17, (colSums(ycoral > 0) > 4)]
# Fit Tweedie model for coral data (this line may take few minutes to run)
fit.twe <- gllvm(y = ycoral, family = "tweedie", method = "LA")
ordiplot(fit.twe)

## Example 6: Random row effects
fitRand <- gllvm(y, family = "negative.binomial", row.eff = "random")
ordiplot(fitRand, biplot = TRUE)

gllvm

Generalized Linear Latent Variable Models

v1.3.0
GPL-2
Authors
Jenni Niku [aut, cre], Wesley Brooks [aut], Riki Herliansyah [aut], Francis K.C. Hui [aut], Sara Taskinen [aut], David I. Warton [aut], Bert van der Veen [aut]
Initial release
2021-4-26

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.