IsoriX: prepsources – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

prepsources

Filter and aggregate the raw source dataset

Description

This function prepares the available dataset to be used for creating the isoscape (e.g. GNIPDataDE). This function allows the trimming of data by months, years and location, and for the aggregation of selected data per location, location:month combination or location:year combination. The function can also be used to randomly exclude some observations.

Usage

prepsources(
  data,
  month = 1:12,
  year,
  long_min,
  long_max,
  lat_min,
  lat_max,
  split_by = NULL,
  prop_random = 0,
  random_level = "source",
  col_source_value = "source_value",
  col_source_ID = "source_ID",
  col_lat = "lat",
  col_long = "long",
  col_elev = "elev",
  col_month = "month",
  col_year = "year"
)

Arguments

`data`	A `dataframe` containing raw isotopic measurements of sources
`month`	A `numeric vector` indicating the months to select from. Should be a vector of round numbers between 1 and 12. The default is 1:12 selecting all months.
`year`	A `numeric vector` indicating the years to select from. Should be a vector of round numbers. The default is to select all years available.
`long_min`	A `numeric` indicating the minimum longitude to select from. Should be a number between -180 and 180. If not provided, -180 will be considered.
`long_max`	A `numeric` indicating the maximal longitude to select from. Should be a number between -180 and 180. If not provided, 180 will be considered.
`lat_min`	A `numeric` indicating the minimum latitude to select from. Should be a number between -90 and 90. If not provided, -90 will be considered.
`lat_max`	A `numeric` indicating the maximal latitude to select from. Should be a number between -90 and 90. If not provided, 90 will be considered.
`split_by`	A `string` indicating whether data should be aggregated per location (`split_by = NULL`, the default), per location:month combination (`split_by = "month"`), or per location:year combination (`split_by = "year"`).
`prop_random`	A `numeric` indicating the proportion of observations or sampling locations (depending on the argument for `random_level`) that will be kept. If `prop_random` is greater than 0, then the function will return a list containing two dataframes: one containing the selected data, called `selected_data`, and one containing the remaining data, called `remaining_data`.
`random_level`	A `string` indicating the level at which random draws can be performed. The two possibilities are `"obs"`, which indicates that observations are randomly drawn taken independently of their location, or "source" (default), which indicates that observations are randomly drawn at the level of sampling locations.
`col_source_value`	A `string` indicating the column containing the isotopic measurements
`col_source_ID`	A `string` indicating the column containing the ID of each sampling location
`col_lat`	A `string` indicating the column containing the latitude of each sampling location
`col_long`	A `string` indicating the column containing the longitude of each sampling location
`col_elev`	A `string` indicating the column containing the elevation of each sampling location
`col_month`	A `string` indicating the column containing the month of sampling
`col_year`	A `string` indicating the column containing the year of sampling

Details

This function aggregates the data as required for the IsoriX workflow. Three aggregation schemes are possible for now. The most simple one, used as default, aggregates the data so to obtained a single row per sampling location. Datasets prepared in this way can be readily fitted with the function isofit to build an isoscape. It is also possible to aggregate data in a different way in order to build sub-isoscapes representing temporal variation in isotope composition, or in order to produce isoscapes weighted by the amount of precipitation (for isoscapes on precipitation data only). The two possible options are to either split the data from each location by month or to split them by year. This is set with the split_by argument of the function. Datasets prepared in this way should be fitted with the function isomultifit.

The function also allows the user to filter the sampling locations based on time (years and/ or months) and space (locations given in geographic coordinates, i.e. longitude and latitude) to calculate tailored isoscapes matching e.g. the time of sampling and speeding up the model fit by cropping/clipping a certain area. The dataframe produced by this function can be used as input to fit the isoscape (see isofit and isomultifit).

Value

This function returns a dataframe containing the filtered data aggregated by sampling location, or a list, see above argument prop_random. For each sampling location the mean and variance sample estimates are computed.

Examples

## Create a processed dataset for Germany
GNIPDataDEagg <- prepsources(data = GNIPDataDE)

head(GNIPDataDEagg)

## Create a processed dataset for Germany per month
GNIPDataDEmonthly <-prepsources(data = GNIPDataDE,
                                split_by = "month")

head(GNIPDataDEmonthly)

## Create a processed dataset for Germany per year
GNIPDataDEyearly <- prepsources(data = GNIPDataDE,
                                split_by = "year")

head(GNIPDataDEyearly)

## Create isoscape-dataset for warm months in germany between 1995 and 1996
GNIPDataDEwarm <- prepsources(data = GNIPDataDE,
                              month = 5:8,
                              year = 1995:1996)

head(GNIPDataDEwarm)


## Create a dataset with 90% of obs
GNIPDataDE90pct <- prepsources(data = GNIPDataDE,
                               prop_random = 0.9,
                               random_level = "obs")

lapply(GNIPDataDE90pct, head) # show beginning of both datasets

## Create a dataset with half the weather sources
GNIPDataDE50pctsources <- prepsources(data = GNIPDataDE,
                                       prop_random = 0.5,
                                       random_level = "source")

lapply(GNIPDataDE50pctsources, head)


## Create a dataset with half the weather sources split per month
GNIPDataDE50pctsourcesMonthly <- prepsources(data = GNIPDataDE,
                                              split_by = "month",
                                              prop_random = 0.5,
                                              random_level = "source")

lapply(GNIPDataDE50pctsourcesMonthly, head)

IsoriX

Isoscape Computation and Inference of Spatial Origins using Mixed Models

v0.8.2

GPL (>= 2)

Authors

Alexandre Courtiol [aut, cre] (<https://orcid.org/0000-0003-0637-2959>), François Rousset [aut] (<https://orcid.org/0000-0003-4670-0371>), Marie-Sophie Rohwaeder [aut], Stephanie Kramer-Schadt [aut] (<https://orcid.org/0000-0002-9269-4446>)

Initial release