Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

corrFamily

Defining parametric correlation families


Description

The corrFamily keyword can be used to declare and fit correlated random effects belonging to a user-defined correlation model (i.e., a parametric family of correlation matrices), or a covariance model. This feature is experimental in the sense that it has been tested in a limited number of applications and a limited number of matrix formats (see recommendations below). In the long term it should facilitate the addition of new “canned” correlation models in spaMM.

In the model formula, the random effect is declared as a term of the form
corrFamily(1|<grouping factor>); and the covStruct argument is used to provide

  • a function f that returns a correlation (or covariance) matrix from an input parameter vector (see below further information on class and attributes of the returned matrix).

  • a template parameter vector tpar, also with some constraints described below.

lower and upper value must be specified for the correlation parameters to be estimated.

Using corrFamily terms will be more efficient than the equally general method, of maximizing an objective function of the correlation parameters that calls (say) fitme() on a model including a corrMatrix itself function of the correlation parameters. But this may still be inefficient if a few issues are ignored. The Details should be read before conceiving a corrFamily model.

Usage

## 'covStruct' element
# corrFamily=list(f=<.>, tpar=<.>)

## 'lower', 'upper', 'init' and 'fixed' values in fitting-function call
## are numeric vectors in the 'corrPars', provided 
## following the general syntax of these arguments; e.g.,
# lower=list(corrPars=list("1"= <numeric vector> ))

Arguments

f

(required): function returning the correlation (or covariance) matrix, given its first argument, a parameter vector. The function value should (1) be of constant class for all parameter values. For families of mathematically sparse matrices, the CsparseMatrix class is recommended (and more specifically the dsCMatrix class since the matrix is symmetric); (2) have row names that match the levels of the grouping factor (the nested random effect Example shows the code needed when this nested effect is defined from two variables).

tpar

(required): a feasible argument of f. If it is a named vector, its names will be used for the parameters, otherwise default names "p1", "p2"... are provided. In order to favor the automatic selection of suitable algorithms, tpar should be chosen so that f(tpar) is least sparse in the correlation family, in terms of its sparsity and of the sparsity of its inverse. A tpar yielding an identity matrix is often a *bad* template as least sparse correlation matrices and their inverses are denser for most families except diagonal ones.

lower,upper

(required): complete vector of lower and upper bounds. Parameter names (as determined by the tpar argument) are not required but will be used if present.

init

(optional): a vector of initial values. If it is incomplete (not providing values for all parameters), names of the specified initial parameter values must be provided.

fixed

(optional): vector of fixed values. Currently only a complete vector is formally handled and there is no check that the vector is complete.

Details

  • spaMM will regularize invalid or nearly-singular correlation or covariance matrices internally if the correlation function has not done so already, but it it better to control this in the correlation function. The regularize convenience function is available for that purpose, but parametrizations that avoid the need for regularization are even better, since fitting models with nearly-singular correlation matrices is prone to various difficulties (The Toeplitz example below is poor from that perspective, although it is good to illustrate potential problems).

  • Users should make sure that any regularized matrix still belongs to the intended parametric family of matrices, and they should keep in mind that the spaMM output will show the input parameters of the unregularized matrix, not the parameters of the regularized one (e.g., in the Toeplitz example below, the fitted matrix is a regularized Toepliz matrix with slightly different coefficients than the input parameters).

    And for efficiency,

  • Let us repeat that the correlation function should return matrices of constant class, and in sparse format when the matrices are indeed mathematically sparse. For mathematically dense matrices (as in the Toeplitz example below), the dsyMatrix class may be suitable.

  • Let us repeat that in order to favor the automatic selection of suitable algorithms, tpar should be chosen so that f(tpar) is least sparse in the correlation family. For matrices of CsparseMatrix, a check is implemented to catch wrong choices of tpar.

  • For challenging problems (large data, many parameters...) it may pay to optimize a bit the correlation function. The Example of nested effects with heterogenous variance below illustrates a possible trick. In the same cases, It may also pay to try the alternative algebraic methods, by first comparing speed of the different methods (control.HLfit=list(algebra= <"spprec"|"spcorr"|"decorr">)) for given correlation parameter values, rather than to assume that spaMM will find the best method (even if it often does so).

Examples

if (spaMM.getOption("example_maxtime")>2 &&
      requireNamespace("agridat", quietly = TRUE)) {

data("onofri.winterwheat", package="agridat")

##### Fitting a Toeplitz correlation model for temporal correlations #####

# A Toeplitz correlation matrix of dimension d*d has d-1 parameters 
# (by symmetry, and with 1s on the main diagonal). These d-1 parameters 
# can be fitted as follows:

Toepfn <- function(v) {
  toepmat <- Matrix::forceSymmetric(toeplitz(c(1,v))) # dsyMatrix
  # Many of the matrices in this family are not valid correlation matrices;
  #   the regularize() function is handy here:
  toepmat <- regularize(toepmat, maxcondnum=1e12)
  # And don't forget the rownames!
  rownames(toepmat) <- unique(onofri.winterwheat$year)
  toepmat
}

(Toepfit <- spaMM::fitme(
  yield ~ gen + corrFamily(1|year), data=onofri.winterwheat, method="REML",
  covStruct=list(corrFamily=list(f=Toepfn, tpar=rep(1e-4,6))), 
         # (Note the gentle warning if one instead uses tpar=rep(0,6) here)  
  lower=list(corrPars=list("1"=rep(-0.999,6))), 
  upper=list(corrPars=list("1"=rep(0.999,6))))) 

# The fitted matrix is (nearly) singular, and was regularized:

eigen(Corr(Toepfit)[[1]])$values

# which means that the returned likelihood may be inaccurate, 
# and also that the actual matrix elements differ from input parameters:

Corr(Toepfit)[[1]][1,-1]  

### The usual rules for specifying covStruct, 'lower', 'upper' and 'init' apply
# here when the corrFamily term is the second random-effect:

(Toep2 <- spaMM::fitme(
        yield ~ 1 + (1|gen) + corrFamily(1|year), data=onofri.winterwheat, method=method,
        covStruct=list("1"=NULL, corrFamily=list(f=Toepfn, tpar=rep(1e-4,6))), 
        , init=list(corrPars=list("2"=rep(0.1,6))),
        lower=list(corrPars=list("2"=rep(-0.999,6))), 
        upper=list(corrPars=list("2"=rep(0.999,6)))))

##### Fitting one variance among years per each of 8 genotypes. #####

# First, note that this can be *more efficiently* fitted by another syntax:

### Fit as a constrained random-coefficient model: 
    
# Diagonal matrix of NA's, represented as vector for its lower triangle:
ranCoefs_for_diag <- function(nlevels) { 
  vec <- rep(0,nlevels*(nlevels+1L)/2L)
  vec[cumsum(c(1L,rev(seq(nlevels-1L)+1L)))] <- NA
  vec
} 
    
(by_rC <- spaMM::fitme(yield ~ 1 + (0+gen|year), data=onofri.winterwheat, method="REML",
                       fixed=list(ranCoefs=list("1"=ranCoefs_for_diag(8)))))
                         
### Fit as a corrFamily model:   

gy_levels <- paste0(gl(8,1,length =56,labels=levels(onofri.winterwheat$gen)),":",
                        gl(7,8,labels=unique(onofri.winterwheat$year)))
                        
# A log scale is often suitable for variances, hence is used below;

# a correct but crude implementation of the model is
diagf <- function(logvar) {
  corr_map <- kronecker(Matrix::.symDiagonal(n=7),diag(x=exp(logvar)))
  rownames(corr_map) <- gy_levels
  corr_map
}

# but we can minimize matrix operations as follows:

corr_map <- Matrix::forceSymmetric(kronecker(Matrix::.symDiagonal(n=7),diag(x=seq(8))))
rownames(corr_map) <- gy_levels

diagf <- function(logvar) {
  corr_map@x <- exp(logvar)[corr_map@x]
  corr_map
}                 # (and this returns a dsCMatrix)

(by_cF <- spaMM::fitme(
  yield ~ 1 + corrFamily(1|gen %in% year), data=onofri.winterwheat, method="REML",
  covStruct=list(corrFamily = list(f=diagf, tpar=rep(1,8))), 
  fixed=list(lambda=1),            # Don't forget to fix this redundant parameter!
  # init=list(corrPars=list("1"=rep(log(O.1),8))), # 'init' optional 
  lower=list(corrPars=list("1"=rep(log(1e-6),8))), # 'lower' and 'upper' required
  upper=list(corrPars=list("1"=rep(log(1e6),8)))))

exp(get_ranPars(by_cF)$corrPars[[1]]) # fitted variances 
  
}

spaMM

Mixed-Effect Models, with or without Spatial Random Effects

v3.10.0
CeCILL-2
Authors
François Rousset [aut, cre, cph] (<https://orcid.org/0000-0003-4670-0371>), Jean-Baptiste Ferdy [aut, cph], Alexandre Courtiol [aut] (<https://orcid.org/0000-0003-0637-2959>), GSL authors [ctb] (src/gsl_bessel.*)
Initial release
2022-02-06

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.