Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

buffering

Use distance (buffer) around records to separate train and test folds


Description

This function generates spatially separated train and test folds by considering buffers of the specified distance around each observation point. This approach is a form of leave-one-out cross-validation. Each fold is generated by excluding nearby observations around each testing point within the specified distance (ideally the range of spatial autocorrelation, see spatialAutoRange). In this method, the testing set never directly abuts a training presence or absence (0s and 1s i.e. the response class). For more information see the details section.

Usage

buffering(
  speciesData,
  species = NULL,
  theRange,
  spDataType = "PA",
  addBG = TRUE,
  progress = TRUE
)

Arguments

speciesData

A simple features (sf) or SpatialPoints object containing species data (response variable).

species

Character. Indicating the name of the field in which species data (binary response i.e. 0 and 1) is stored. If speceis = NULL the presence and absence data (response variable) will be treated the same and only training and testing records will be counted. This can be used for multi-class responses such as land cover classes for remote sensing image classification, but it is not necessary. Do not use this argument when the response variable is continuous or count data.

theRange

Numeric value of the specified range by which the training and testing datasets are separated. This distance should be in metres no matter what the coordinate system is. The range can be explored by spatialAutoRange.

spDataType

Character input indicating the type of species data. It can take two values, PA for presence-absence data and PB for presence-background data, when species argument is not NULL. See the details section for more information on these two approaches.

addBG

Logical. Add background points to the test set when spDataType = "PB".

progress

Logical. If TRUE a progress bar will be shown.

Details

When working with presence-background (presence and pseudo-absence) data (specified by spDataType argument), only presence records are used for specifying the folds. Consider a target presence point. The buffer is defined around this target point, using the specified range (theRange). The testing fold comprises the target presence point and all background points within the buffer (this is the default. If addBG = FALSE the bacground points are ignored). Any non-target presence points inside the buffer are excluded. All points (presence and background) outside of buffer are used for the training set. The methods cycles through all the presence data, so the number of folds is equal to the number of presence points in the dataset.

For presence-absence data, folds are created based on all records, both presences and absences. As above, a target observation (presence or absence) forms a test point, all presence and absence points other than the target point within the buffer are ignored, and the training set comprises all presences and absences outside the buffer. Apart from the folds, the number of training-presence, training-absence, testing-presence and testing-absence records is stored and returned in the records table. If species = NULL (no column with 0s and 1s is defined), the procedure is like presence-absence data. All other types of data (continuous, count or multi-class responses) should be used like this.

Value

An object of class S3. A list of objects including:

  • folds - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices

  • k - number of the folds

  • range - the distance band to separated trainig and testing folds)

  • species - the name of the species (column), if provided

  • dataType - species data type

  • records - a table with the number of points in each category of training and testing

See Also

spatialAutoRange for selecting buffer distance; spatialBlock and envBlock for alternative blocking strategies; foldExplorer for visualisation of the generated folds.

Examples

# import presence-absence species data
PA <- read.csv(system.file("extdata", "PA.csv", package = "blockCV"))
# coordinate reference system
Zone55s <- "+proj=utm +zone=55 +south +ellps=GRS80 +units=m +no_defs"
# make a sf object from data.frame
pa_data <- sf::st_as_sf(PA, coords = c("x", "y"), crs = Zone55s)

# buffering with presence-absence data
bf1 <- buffering(speciesData= pa_data,
                 species= "Species",
                 theRange= 70000,
                 spDataType = "PA",
                 progress = TRUE)


# import presence-background species data
PB <- read.csv(system.file("extdata", "PB.csv", package = "blockCV"))
# make a sf object from data.frame
pb_data <- sf::st_as_sf(PB, coords = c("x", "y"), crs = Zone55s)

# buffering with presence-background data
bf2 <- buffering(speciesData= pb_data,
                 species= "Species",
                 theRange= 70000,
                 spDataType = "PB",
                 addBG = TRUE, # add background data to testing folds
                 progress = TRUE)

# buffering with no species attribute
bf3 <- buffering(speciesData = pa_data,
                 theRange = 70000)

blockCV

Spatial and Environmental Blocking for K-Fold Cross-Validation

v2.1.1
GPL-3
Authors
Roozbeh Valavi [aut, cre], Jane Elith [aut], José Lahoz-Monfort [aut], Gurutzeta Guillera-Arroita [aut]
Initial release
2020-02-16

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.