BiodiversityR: ensemble.blockCV – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

ensemble.blockCV

Spatially or environmentally separated folds for cross-validation via blockCV::spatialBlock or blockCV::envBlock

Description

The functions internally calls blockCV::spatialBlock and blockCV::envBlock. Syntax is very similar to that of BiodiversityR::ensemble.calibrate.weights.

Usage

ensemble.spatialBlock(x = NULL, p = NULL, 
    a = NULL, an = 1000, EPSG=NULL,
    excludep = FALSE, target.groups = FALSE, k = 4,
    factors = NULL,
    theRange = NULL, return.object = FALSE, ...)

ensemble.envBlock(x = NULL, p = NULL, 
    a = NULL, an = 1000, EPSG=NULL,
    excludep = FALSE, target.groups = FALSE, k = 4,
    factors = NULL,
    return.object = FALSE, ...)

Arguments

`x`	RasterStack object (`stack`) containing all layers that correspond to explanatory variables
`p`	presence points used for calibrating the suitability models, typically available in 2-column (lon, lat) dataframe; see also `prepareData` and `extract`
`a`	background points used for calibrating the suitability models, typically available in 2-column (lon, lat) dataframe; see also `prepareData` and `extract`
`an`	number of background points for calibration to be selected with `randomPoints` in case argument `a` is missing
`EPSG`	EPSG number (https://spatialreference.org/) to be assigned internally to the coordinate reference system of the locations via `st_crs`. Although the function internally first assigns the coordinate reference from the RasterStack `x` via `crs`, this method fails in some situations as in the example shown below. In such cases, manually assigning the EPSG could resolve this problem.
`excludep`	parameter that indicates (if `TRUE`) that presence points will be excluded from the background points; see also `randomPoints`
`target.groups`	Parameter that indicates (if `TRUE`) that the provided background points (argument `a`) represent presence points from a target group sensu Phillips et al. 2009 (these are species that are all collected or observed using the same methods or equipment). Setting the parameter to `TRUE` results in selecting the centres of cells of the target groups as background points, while avoiding to select the same cells twice. Via argument `excludep`, it is possible to filter out cells with presence observations (argument `p`).
`k`	Integer value. The number of desired folds for cross-validation. The default is `k = 4`. The interpretation of the argument is exactly the same as in `ensemble.calibrate.models` and `kfold`.
`factors`	vector that indicates which variables are factors; see also `prepareData`
`theRange`	Numeric value of the specified range by which blocks are created and training/testing data are separated. This distance should be in metres. See also `spatialBlock`.
`return.object`	If `TRUE`, then also return ('block.object') the complete result of `spatialBlock` or `envBlock`. In addtion (if `TRUE`), return the species data ('speciesData') that was created for `blockCV`. To visualize these results, see below or from `foldExplorer`.
`...`	Other arguments to pass to `spatialBlock` or `envBlock`, such `numLimit` (The minimum number of points in each fold for training-presence, training-absence, testing-presence and testing-absence) and `iteration` (The number of attempts to create folds that fulfil the `numLimit` requirement).

Details

The functions internally call spatialBlock or envBlock.

The result of the function includes a list (k) with following elements. This list can be directly imported into ensemble.calibrate.weights, but only elements groupp and groupa will be used.

- p : Presence locations, created by ensemble.calibrate.models where points with missing data were excluded and possibly points were added for missing factor levels

- a : Background locations, created by ensemble.calibrate.models where points with missing data were excluded and possibly points were added for missing factor levels

- groupp : k-fold identities for the presence locations

- groupa : k-fold identities for the background locations

Optionally the function also returns elements block.object and speciesData. These can be used to visualize data with foldExplorer.

Value

The function returns a list with the following elements:.

`k`	A list with data on folds that can be directly used by `ensemble.calibrate.weights`.
`block.object`	the results of `spatialBlock` or `envBlock`
`speciesData`	a `SpatialPointsDataFrame` with species data

Author(s)

Roeland Kindt (World Agroforestry Centre)

References

Roberts et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 40: 913-929.

Examples

## Not run: 

library(blockCV)
library(sf)

# get predictor variables
library(dismo)
predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
    pattern='grd', full.names=TRUE)
predictors <- stack(predictor.files)
# subset based on Variance Inflation Factors
predictors <- subset(predictors, subset=c("bio5", "bio6", 
    "bio16", "bio17"))
predictors
predictors@title <- "base"

# presence points
presence_file <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
pres <- read.table(presence_file, header=TRUE, sep=',')[, -1]

# choose background points
background <- randomPoints(predictors, n=1000, p=pres, excludep=T, extf=1.00)
background <- data.frame(background)
colnames(background)=c('lon', 'lat')

# spatial blocking with square blocks of 1000 km and minimum 20 points in each categor
# fails if EPSG is not assigned
block.data <- ensemble.spatialBlock(x=predictors, p=pres, a=background, 
    EPSG=NULL,
    showBlocks=F, theRange=1000000, k=4, numLimit=20, iteration=1000, return.object=T)

block.data <- ensemble.spatialBlock(x=predictors, p=pres, a=background, 
    EPSG=4326,
    showBlocks=F, theRange=1000000, k=4, numLimit=20, iteration=1000, return.object=T)

# explore the results
foldExplorer(blocks=block.data$block.object, rasterLayer=predictors, 
    speciesData=block.data$speciesData)

# apply in calibration of ensemble weights
# make sure that folds apply to subset of points
p.spatial <- block.data$k$p 
a.spatial <- block.data$k$a
k.spatial <- block.data$k

ensemble.w1 <- ensemble.calibrate.weights(x=predictors, 
    p=p.spatial, a=a.spatial, k=k.spatial, 
    species.name="Bradypus",
    SINK=FALSE, PROBIT=TRUE,
    MAXENT=0, MAXNET=1, MAXLIKE=1, GBM=1, GBMSTEP=0, RF=0, CF=1,
    GLM=1, GLMSTEP=0, GAM=1, GAMSTEP=0, MGCV=0, MGCVFIX=0, 
    EARTH=0, RPART=0, NNET=1, FDA=0, SVM=0, SVME=0, GLMNET=0,
    BIOCLIM.O=1, BIOCLIM=1, DOMAIN=0, MAHAL=0, MAHAL01=0,
    ENSEMBLE.tune=TRUE,
    ENSEMBLE.best=0, ENSEMBLE.exponent=c(1, 2, 3),
    ENSEMBLE.min=0.7,
    Yweights="BIOMOD", 
    formulae.defaults=TRUE)

# confirm that correct folds were used
all.equal(ensemble.w1$groupp, block.data$k$groupp)
all.equal(ensemble.w1$groupa, block.data$k$groupa)

# environmental blocking with minimum 5 points in each category
block.data2 <- ensemble.envBlock(x=predictors, p=pres, a=background, 
    factors="biome",
    k=4, numLimit=5, return.object=T)

# explore the results
foldExplorer(blocks=block.data2$block.object, rasterLayer=predictors, 
    speciesData=block.data2$speciesData)

# apply in calibration of ensemble weights
# make sure that folds apply to subset of points
p.env <- block.data2$k$p 
a.env <- block.data2$k$a
k.env <- block.data2$k

ensemble.w2 <- ensemble.calibrate.weights(x=predictors, 
    p=p.env, a=a.env, k=k.env, 
    species.name="Bradypus",
    SINK=FALSE, PROBIT=TRUE,
    MAXENT=0, MAXNET=1, MAXLIKE=1, GBM=1, GBMSTEP=0, RF=0, CF=1,
    GLM=1, GLMSTEP=0, GAM=1, GAMSTEP=0, MGCV=0, MGCVFIX=0, 
    EARTH=0, RPART=0, NNET=1, FDA=0, SVM=0, SVME=0, GLMNET=0,
    BIOCLIM.O=1, BIOCLIM=1, DOMAIN=0, MAHAL=0, MAHAL01=0,
    ENSEMBLE.tune=TRUE,
    ENSEMBLE.best=0, ENSEMBLE.exponent=c(1, 2, 3),
    ENSEMBLE.min=0.7,
    factors="biome",
    Yweights="BIOMOD", 
    formulae.defaults=TRUE)

# confirm that correct folds were used
all.equal(ensemble.w2$groupp, block.data2$k$groupp)
all.equal(ensemble.w2$groupa, block.data2$k$groupa)


## End(Not run)

BiodiversityR

Package for Community Ecology and Suitability Analysis

v2.13-1

GPL-3

Authors

Roeland Kindt [cre, aut] (<https://orcid.org/0000-0002-7672-0712>)

Initial release

2021-4-15