spaMM: HLfit – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

HLfit

Fit mixed models with given correlation matrix

Description

This function fits GLMMs as well as some hierarchical generalized linear models (HGLM; Lee and Nelder 2001). HLfit fits both fixed effects parameters, and dispersion parameters i.e. the variance of the random effects (full covariance for random-coefficient models), and the variance of the residual error. The linear predictor is of the standard form offset+ X beta + Z b, where X is the design matrix of fixed effects and Z is a design matrix of random effects (typically an incidence matrix with 0s and 1s, but not necessarily). Models are fitted by an iterative algorithm alternating estimation of fixed effects and of dispersion parameters. The residual dispersion may follow a “structured-dispersion model” modeling heteroscedasticity. Estimation of the latter parameters is performed by a form of fit of debiased residuals, which allows fitting a structured-dispersion model (Smyth et al. 2001). However, evaluation of the debiased residuals can be slow in particular for large datasets. For models without structured dispersion, it is then worth using the fitme function (or the corrHLfit function with non-default arguments). These functions can optimize the likelihood of HLfit fits for different given values of the dispersion parameters (“outer optimization”), thereby avoiding the need to estimate debiased residuals.

Usage

HLfit(formula, data, family = gaussian(), rand.family = gaussian(), 
      resid.model = ~1, REMLformula = NULL, verbose = c(trace = FALSE), 
      HLmethod = "HL(1,1)", method="REML", control.HLfit = list(), 
      control.glm = list(), init.HLfit = list(), ranFix = list(), 
      etaFix = list(), prior.weights = NULL, processed = NULL)
## see 'rand.family' argument for inverse.Gamma

Arguments

`formula`	A `formula`; or a `predictor`, i.e. a formula with attributes created by `Predictor`, if design matrices for random effects have to be provided. See Details in `spaMM` for allowed terms in the formula (except spatial ones).
`data`	A data frame containing the variables named in the model formula.
`family`	A `family` object describing the distribution of the response variable. See Details in `spaMM` for handled families.
`rand.family`	A `family` object describing the distribution of the random effect, or a `list` of family objects for different random effects (see Examples). Possible options are `gaussian()`, `Gamma(log)`, `Gamma(identity)` (see Details), `Beta(logit)`, `inverse.Gamma(-1/mu)`, and `inverse.Gamma(log)`. For discussion of these alternatives see Lee and Nelder 2001 or Lee et al. 2006, p. 178-. Here the family gives the distribution of a random effect u and the link gives `v` as function of u (see Details). If there are several random effects and only one family is given, this family holds for all random effects.
`resid.model`	Either a formula (without left-hand side) for the dispersion parameter `phi` of the residual error. A log link is assumed by default; or a list, with at most three possible elements if its formula involves only fixed effects: formula model formula as in formula-only case, without left-hand side family Always Gamma, with by default a log link. `Gamma(identity)` can be tried but may fail because only the log link ensures that the fitted φ is positive. fixed can be used to specify the residual dispersion parameter of the residual dispersion model itself. The default value is 1; this argument can be used to set another value, and `fixed=list(phi=NA)` will force estimation of this parameter. and additional possible elements (all named as `fitme` arguments) if its formula involves random effects: see `phiHGLM`.
`REMLformula`	A model `formula` that controls the estimation of dispersion parameters and the computation of restricted likelihood (`p_bv`), where the conditioning inherent in REML is defined by a model different from the predictor `formula`. A simple example (useless in practice) of its effect is to replicate an ML fit by specifying `method="REML"` and an `REMLformula` with no fixed effect. The latter implies that no conditioning is performed and that `p_bv` equals the marginal likelihood (or its approximation), `p_v`. One of the examples in `update.HLfit` shows how `REMLformula` can be useful, but otherwise this argument may never be needed for standard REML or ML fits. For non-standard likelihood ratio tests using `REMLformula`, see `fixedLRT`.
`verbose`	A vector of booleans. The `trace` element controls various diagnostic messages (possibly messy) about the iterations. This should be distinguished from the `TRACE` element, meaningful in `fitme` or `corrHLfit` calls. `phifit` (which defaults to `TRUE`) controls messages about the progress of residual dispersion fits in DHGLMs.
`method`	Character: the fitting method. allowed values are `"REML"`, `"ML"`, `"EQL-"` and `"EQL+"` for all models; `"PQL"` (=`"REPQL"`) and `"PQL/L"` for GLMMs only; and further values for those curious to experiment (see `method`). The default is REML (standard REML for LMMs, an extended definition for other models). REML can be viewed as a form of conditional inference, and non-standard conditionings can be called by using a non-standard `REMLformula`.
`HLmethod`	Same as `method`. It is useless to specify `HLmethod` when `method` is specified. The default value `"HL(1,1)"` means the same as `method="REML"`, but more accurately relates to definitions of approximations of likelihood in the h-likelihood literature.
`control.HLfit`	A list of parameters controlling the fitting algorithms, which should be ignored in routine use. Suche a parameter, `resid.family`, was previously documented here (before version 2.6.40), and will still operate as previously documented, but should not be used in new code. Possible parameters are: `conv.threshold` and `spaMM_tol`: `spaMM_tol` is a list of tolerance values, with elements `Xtol_rel` and `Xtol_abs` that define thresholds for relative and absolute changes in parameter values in iterative algorithms (used in tests of the form “d(param)< Xtol_rel * param + Xtol_abs”, so that `Xtol_abs` is operative only for small parameter values). `conv.threshold` is the older way to control `Xtol_rel`. Default values are given by spaMM.getOption("spaMM_tol"); `break_conv_logL`, a boolean specifying whether the iterative algorithm should terminate when log-likelihood appears to have converged (roughly, when its relative variation over on iteration is lower than 1e-8). Default is FALSE (convergence is then assessed on the parameter estimates rather than on log-likelihood). `iter.mean.dispFix`, the number of iterations of the iterative algorithm for coefficients of the linear predictor, if no dispersion parameters are estimated by the iterative algorithm. Defaults to 200 except for Gamma(log)-family models; `iter.mean.dispVar`, the number of iterations of the iterative algorithm for coefficients of the linear predictor, if some dispersion parameter(s) is estimated by the iterative algorithm. Defaults to 50 except for Gamma(log)-family models; `max.iter`, the number of iterations of the iterative algorithm for joint estimation of dispersion parameters and of coefficients of the linear predictor. Defaults to 200. This is typically much more than necessary, unless there is little information to separately estimate λ and φ parameters. `algebra, sparse_precision`See `algebra`.
`control.glm`	List of parameters controlling GLM fits, passed to `glm.control`; e.g. `control.glm=list(maxit=100)`. See `glm.control` for further details.
`init.HLfit`	A list of initial values for the iterative algorithm, with possible elements of the list are `fixef` for fixed effect estimates (beta), `v_h` for random effects vector v in the linear predictor, `lambda` for the parameter determining the variance of random effects u as drawn from the `rand.family` distribution `phi` for the residual variance. However, this argument can be ignored in routine use.

`ranFix`	A list of fixed values of random effect parameters. See `ranFix` for further information.
`etaFix`	A list of given values of the coefficients of the linear predictor. See `etaFix` for further information.
`prior.weights`	An optional vector of prior weights as in `glm`. This fits the data to a probability model with residual variance `phi/prior.weights`, and all further outputs are defined to be consistent with this (see section IV in Details).
`processed`	A list of preprocessed arguments, for programming purposes only (as in `corrHLfit`).

Details

I. Approximations of likelihood: see method.

II. Possible structure of Random effects: see random-effects, but note that HLfit does not fit models with autocorrelated random effects).

III. The standard errors reported may sometimes be misleading. For each set of parameters among β, λ, and φ parameters these are computed assuming that the other parameters are known without error. This is why they are labelled Cond. SE (conditional standard error). This is most uninformative in the unusual case where λ and φ are not separately estimable parameters. Further, the SEs for λ and φ are rough approximations as discussed in particular by Smyth et al. (2001; V_1 method).

IV. prior weights. This controls the likelihood analysis of heteroscedastic models. In particular, changing the weights by a constant factor f should, and will, yield a fit with unchanged likelihood and (Intercept) estimates of phi also increased by f (except if a non-trivial resid.formula with log link is used). This is consistent with what glm does, but other packages may not follow this logic (whatever their documentation may say: check by yourself by changing the weights by a constant factor).

Value

An object of class HLfit, which is a list with many elements, not all of which are documented.

A few extractor functions are available (see extractors), and should be used as far as possible as they should be backward-compatible from version 1.4 onwards, while the structure of the return object may still evolve. The following information will be useful for extracting further elements of the object.

Elements include descriptors of the fit:

`eta`	Fitted values on the linear scale (including the predicted random effects);
`fv`	Fitted values (μ=<inverse-link>(η)) of the response variable (returned by the `fitted` function);
`fixef`	The fixed effects coefficients, β (returned by the `fixef` function);
`v_h`	The random effects on the linear scale, v, with atttribute the random effects u (returned by `ranef(*,type="uncorrelated")`;
`phi`	The residual variance φ;
`phi.object`	A possibly more complex object describing φ;
`lambda`	The random-effect (u) variance(s) λ in compact form;
`lambda.object`	A possibly more complex object describing λ;
`corrPars`	Agglomerates information on correlation parameters, either fixed, or estimated by `HLfit`, `corrHLfit` or `fitme`;
`APHLs`	A list which elements are various likelihood components, include conditional likelihood, h-likelihood, and the two adjusted profile h-likelihoods: the (approximate) marginal likelihood `p_v` and the (approximate) restricted likelihood `p_bv` (the latter two available through the `logLik` function). See the extractor function `get_any_IC` for information criteria (“AIC”) and effective degrees of freedom;
	The covariance matrix of β estimates is not included as such, but can be extracted by `vcov`;

Information about the input is contained in output elements named as HLfit or corrHLfit arguments (data,family,resid.family,ranFix,prior.weights), with the following notable exceptions or modifications:

`predictor`	The `formula`, possibly reformatted;
`resid.predictor`	Analogous to `predictor`, for the residual variance;
`rand.families`	corresponding to the `rand.family` input;

Further miscellaneous diagnostics and descriptors of model structure:

`X.pv`	The design matrix for fixed effects;
`ZAlist,strucList`	Two lists of matrices, respectively the design matrices “Z”, and the “L” matrices, for the different random-effect terms. The extractor `get_ZALMatrix` can be used to reconstruct a single “ZL” matrix for all terms.
`BinomialDen`	(binomial data only) the binomial denominators;
`y`	the response vector; for binomial data, the frequency response.
`models`	Additional information on model structure for η, λ and φ;
`HL`	A set of indices that characterize the approximations used for likelihood;
`leve_phi,lev_lambda`	Leverages;
`dfs`	degrees of freedom for different components of the model;
`warnings`	A list of warnings for events that may have occurred during the fit.

Finally, the object includes programming tools: call, spaMM.version, fit_time and envir.

References

Lee, Y., Nelder, J. A. (2001) Hierarchical generalised linear models: A synthesis of generalised linear models, random-effect models and structured dispersions. Biometrika 88, 987-1006.

Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalized linear models with random effects: unified analysis via h-likelihood. Chapman & Hall: London.

Smyth GK, Huele AF, Verbyla AP (2001). Exact and approximate REML for heteroscedastic regression. Statistical Modelling 1, 161-175.

Examples

data("wafers")
## Gamma GLMM with log link

HLfit(y ~ X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch), family=Gamma(log),
          resid.model = ~ X3+I(X3^2) ,data=wafers)

## Gamma - inverseGamma HGLM with log link
HLfit(y ~ X1+X2+X1*X3+X2*X3+I(X2^2)+(1|batch), family=Gamma(log),
          rand.family=inverse.Gamma(log),
          resid.model = ~ X3+I(X3^2) , data=wafers)

spaMM

Mixed-Effect Models, with or without Spatial Random Effects

v3.10.0

CeCILL-2

Authors

François Rousset [aut, cre, cph] (<https://orcid.org/0000-0003-4670-0371>), Jean-Baptiste Ferdy [aut, cph], Alexandre Courtiol [aut] (<https://orcid.org/0000-0003-0637-2959>), GSL authors [ctb] (src/gsl_bessel.*)

Initial release

2022-02-06