Filter and aggregate the raw source dataset
This function prepares the available dataset to be used for creating the isoscape (e.g. GNIPDataDE). This function allows the trimming of data by months, years and location, and for the aggregation of selected data per location, location:month combination or location:year combination. The function can also be used to randomly exclude some observations.
prepsources( data, month = 1:12, year, long_min, long_max, lat_min, lat_max, split_by = NULL, prop_random = 0, random_level = "source", col_source_value = "source_value", col_source_ID = "source_ID", col_lat = "lat", col_long = "long", col_elev = "elev", col_month = "month", col_year = "year" )
data |
A dataframe containing raw isotopic measurements of sources |
month |
A numeric vector indicating the months to select from. Should be a vector of round numbers between 1 and 12. The default is 1:12 selecting all months. |
year |
A numeric vector indicating the years to select from. Should be a vector of round numbers. The default is to select all years available. |
long_min |
A numeric indicating the minimum longitude to select from. Should be a number between -180 and 180. If not provided, -180 will be considered. |
long_max |
A numeric indicating the maximal longitude to select from. Should be a number between -180 and 180. If not provided, 180 will be considered. |
lat_min |
A numeric indicating the minimum latitude to select from. Should be a number between -90 and 90. If not provided, -90 will be considered. |
lat_max |
A numeric indicating the maximal latitude to select from. Should be a number between -90 and 90. If not provided, 90 will be considered. |
split_by |
A string indicating whether data should be aggregated
per location ( |
prop_random |
A numeric indicating the proportion of observations
or sampling locations (depending on the argument for |
random_level |
A string indicating the level at which random draws
can be performed. The two possibilities are |
col_source_value |
A string indicating the column containing the isotopic measurements |
col_source_ID |
A string indicating the column containing the ID of each sampling location |
col_lat |
A string indicating the column containing the latitude of each sampling location |
col_long |
A string indicating the column containing the longitude of each sampling location |
col_elev |
A string indicating the column containing the elevation of each sampling location |
col_month |
A string indicating the column containing the month of sampling |
col_year |
A string indicating the column containing the year of sampling |
This function aggregates the data as required for the IsoriX workflow. Three
aggregation schemes are possible for now. The most simple one, used as
default, aggregates the data so to obtained a single row per sampling
location. Datasets prepared in this way can be readily fitted with the
function isofit
to build an isoscape. It is also possible to
aggregate data in a different way in order to build sub-isoscapes
representing temporal variation in isotope composition, or in order to
produce isoscapes weighted by the amount of precipitation (for isoscapes on
precipitation data only). The two possible options are to either split the
data from each location by month or to split them by year. This is set with
the split_by
argument of the function. Datasets prepared in this way
should be fitted with the function isomultifit
.
The function also allows the user to filter the sampling locations based on
time (years and/ or months) and space (locations given in geographic
coordinates, i.e. longitude and latitude) to calculate tailored isoscapes
matching e.g. the time of sampling and speeding up the model fit by
cropping/clipping a certain area. The dataframe produced by this function can
be used as input to fit the isoscape (see isofit
and
isomultifit
).
This function returns a dataframe containing the filtered data
aggregated by sampling location, or a list, see above argument
prop_random
. For each sampling location the mean and variance sample
estimates are computed.
## Create a processed dataset for Germany GNIPDataDEagg <- prepsources(data = GNIPDataDE) head(GNIPDataDEagg) ## Create a processed dataset for Germany per month GNIPDataDEmonthly <-prepsources(data = GNIPDataDE, split_by = "month") head(GNIPDataDEmonthly) ## Create a processed dataset for Germany per year GNIPDataDEyearly <- prepsources(data = GNIPDataDE, split_by = "year") head(GNIPDataDEyearly) ## Create isoscape-dataset for warm months in germany between 1995 and 1996 GNIPDataDEwarm <- prepsources(data = GNIPDataDE, month = 5:8, year = 1995:1996) head(GNIPDataDEwarm) ## Create a dataset with 90% of obs GNIPDataDE90pct <- prepsources(data = GNIPDataDE, prop_random = 0.9, random_level = "obs") lapply(GNIPDataDE90pct, head) # show beginning of both datasets ## Create a dataset with half the weather sources GNIPDataDE50pctsources <- prepsources(data = GNIPDataDE, prop_random = 0.5, random_level = "source") lapply(GNIPDataDE50pctsources, head) ## Create a dataset with half the weather sources split per month GNIPDataDE50pctsourcesMonthly <- prepsources(data = GNIPDataDE, split_by = "month", prop_random = 0.5, random_level = "source") lapply(GNIPDataDE50pctsourcesMonthly, head)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.