Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

good-practice

Clear and trustworthy formulas


Description

Base fitting functions in R will seek variables in the environment where the formula was defined (i.e., typically in the global environment), if they are not in the data. This increases the memory size of fit objects (as the formula and attached environment are part of such objects). This also easily leads to errors (see example in the discussion of update.HLfit). Indeed Chambers (2008, p.221), after describing how the environment is defined, comments that “Where clear and trustworthy software is a priority, I would personally avoid such tricks. Ideally, all the variables in the model frame should come from an explicit, verifiable data source...”. Fitting functions in spaMM try to adhere to such a principle, as they assume by default that all variables from the formula should be in the data argument (and then, one never needs to specify “data$” in the formula.. The variables defining the prior.weights should also be in the data. However, variables used in other arguments such as ranFix are looked up neither in the data nor in the formula environment, but in the calling environment as usual.

spaMM implements this by default by stripping the formula environment from any variable. It is also possible to assign a given environment to the formula, through the control control.HLfit$formula_env: see Examples. However, the search mechanism of R is such that variables present in the formula but not in the data nor in the formula environment will still be sought in the global environment, so bugs are not entirely preventable.

References

Chambers J.M. (2008) Software for data analysis: Programming with R. Springer-Verlag New York

Examples

set.seed(123)
d2 <- data.frame(y = seq(10)/2+rnorm(5)[gl(5,2)], x1 = sample(10), grp=gl(5,2), seq10=seq(10))
# Using only variables in the data: basic usage
# HLfit(y ~ x1 + seq10+(1|grp), data = d2)
# is practically equivalent to
HLfit(y ~ x1 + seq10+(1|grp), data = d2, 
      control.HLfit = list(formula_env=list2env(list(data=d2))))
# 
# The 'formula_env' avoids the need for the 'seq10' variable:
HLfit(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2, 
      control.HLfit = list(formula_env=list2env(list(data=d2))))
#
# Internal imprementation exploits partial matching of argument names
#  so that this can also be in 'control' if 'control.HLfit' is absent:      
fitme(y ~ x1 + I(seq_len(nrow(data)))+(1|grp), data = d2, 
      control = list(formula_env=list2env(list(data=d2))))

spaMM

Mixed-Effect Models, with or without Spatial Random Effects

v3.10.0
CeCILL-2
Authors
François Rousset [aut, cre, cph] (<https://orcid.org/0000-0003-4670-0371>), Jean-Baptiste Ferdy [aut, cph], Alexandre Courtiol [aut] (<https://orcid.org/0000-0003-0637-2959>), GSL authors [ctb] (src/gsl_bessel.*)
Initial release
2022-02-06

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.