climatol: homogsplit – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

homogsplit

Apply homogen() on overlapping rectangular areas.

Description

If the number of series is too big to be homogenized all at a time (normally several thousands, depending on the computer resources), this function can homogenize them by splitting the geographical domain in overlapping rectangular areas.

Usage

homogsplit(varcli, anyi, anyf, xc=NULL, yc=NULL, xo=.5, yo=.38, maponly=FALSE,
suf=NA, nm=NA, nref=c(10,10,4), swa=NA, std=3, ndec=1, dz.max=5,
dz.min=-dz.max, wd=c(0,0,100), snht1=25, snht2=snht1, tol=.02, maxdif=NA,
mxdif=maxdif, force=FALSE, wz=.001, trf=0, mndat=NA, gp=3, ini=NA,
na.strings="NA", maxite=999, vmin=NA, vmax=NA, nclust=100, grdcol=grey(.4),
mapcol=grey(.4), hires=TRUE, expl=FALSE, metad=FALSE, sufbrk='m', tinc=NA,
tz='UTC', cex=1.2, verb=TRUE, x=NA)

Arguments

`varcli`	Acronym of the name of the studied climatic variable, as in the data file name.
`anyi`	Initial year of the data present in the file.
`anyf`	Final year of the data present in the file.
`xc`	Vector of X axis coordinates setting the domain splitting meridians.
`yc`	Vector of Y axis coordinates setting the domain splitting parallels.
`xo`	Overlapping width in the East-West direction.
`yo`	Overlapping width in the North-South direction.
`maponly`	Do not homogenize. Only draw a map with stations locations and domain partitioning.
`suf`	Optional suffix appended with a '-' to the name of the variable in the input files.
`nm`	Number of data per year in each station. (Defaults to NA, and then it will be computed from the total number of data).
`nref`	Maximum number of references for data estimation. (Defaults to 10 in the detection stages, and to 4 in the final series adjustments).
`swa`	Size of the step forward to be applied to the staggered window application of SNHT. If not set (the default), 365 terms (one year) will be used for daily data, and 60 otherwise.
`std`	Type of normalization: 1: deviations from the mean, 2: rates to the mean (only for means greater than 1), 3: standardization (subtract the mean and divide by the sample standard deviation).
`ndec`	Number of decimal digits to which the homogenized data must be rounded.
`dz.max`	Threshold of outlier tolerance, in standard deviations. (5 by default).
`dz.min`	Lower threshold of outlier tolerance if different from the higher one. (By default, they will be the same, with opposite signs).
`wd`	Distance (in km) at which reference data will weigh half of that of another located at zero distance. (Defaults to `c(0,0,100)`, meaning that no weighting will be applied in the first two stages, and 100 km in the third).
`snht1`	Threshold value for the stepped SNHT window test applied in stage 1. (25 by default. No SNHT analysis will be performed if `snht1=0`).
`snht2`	Threshold value for the SNHT test when applied to the complete series in stage 2 (same value as snht1 by default).
`tol`	Tolerance factor to split several series at a time. The default is 0.02, meaning that a 2% will be allowed for every reference data. (E.g.: if the maximum SNHT test value in a series is 30 and 10 references were used to compute the anomalies, the series will be split if the maximum test of the reference series is lower than 30(1+0.0210)=36. Set `tol=0` to disable further splits when any reference series has already been split at the same iteration).
`maxdif`	Maximum difference of any data item in consecutive iterations. If not set, defaults to half of the data precision (defined by the number of decimals).
`mxdif`	Old maxdif parameter (maintained for compatilibility).
`force`	Force break even when only one reference is available. (`FALSE` by default).
`wz`	Scale parameter of the vertical coordinate `Z`. 0.001 by default, which gives the vertical coordinate (in m) the same weight as the horizontal coordinates (internally managed in km).
`trf`	By default, data are not transformed (`trf=0`), but if the data frequency distribution is very skewed, the user can choose to apply a log(x+1) transformation (`trf=1`) or any root of index `trf>1` (2 for square root, 3 for cubic root, etc. Fractional numbers are allowed).
`mndat`	Minimum number of data for a split fragment to become a new series. It defaults to half of the `swa` value for daily data, or to `nm` otherwise, with a minimum of 5 terms.
`gp`	Graphic parameter: 0: no graphic output, 1: only descriptive graphics of the input data, 2: as with 1, plus diagnostic graphics of anomalies, 3: as with 2, plus graphics of running annual means and applied corrections, 4: as with 3, but running annual totals (instead of means) will be plotted in the last set of graphics. (Better when working with precipitation data).
`ini`	Initial date, with format `'YYYY-MM-DD'`. If not set, it will be assumed that the series begin the first of January of the initial year `anyi`.
`na.strings`	Character string to be treated as a missing value. (It can be a vector of strings, if more than one is needed). Defaults to 'NA', the standard missing data code in R.
`maxite`	Maximum number of iterations when computing the means of the series. (999 by default).
`vmin`	Minimum possible value (lower limit) of the studied variable. Unset by default, but note that `vmin=0` will be applied if `std` is set to 2.
`vmax`	Maximum possible value (upper limit) of the studied variable. (E.g., for relative humidity or relative sunshine hours it is advisable to set `vmax=100`).
`nclust`	Maximum number of stations for the cluster analysis. (If much greater than 100, the default value, the process may be too long and the graphic too dense).
`grdcol`	Color of the graphic background grids. (Gray by default.)
`mapcol`	Color of the background map. (Gray by default).
`hires`	By default, the background map will be drawn in high resolution. Set this parameter to `FALSE` if you are studying a big geographical area (>1000 km).
`expl`	Set this to `TRUE` to perform an exploratory analysis.
`metad`	Set this to `TRUE` if a metadata file is provided (see the details).
`sufbrk`	Suffix to add to `varcli` to form the name of the provided metadata file. This parameter is only relevant when `metad=TRUE`. Its default value `'m'` is meant to read the file of break-points detected at the monthly scale.
`tinc`	Time increment between data. Not set by default, but can be defined for subdaily data, as in e.g.: `tinc='3 hour'`.
`tz`	Time zone. Only relevant for subdaily data. (`'UTC'` by default.)
`cex`	Character expansion factor for graphic labels and titles. (Defaults to 1.2. Note that if station names are long, they will not fit in titles when increasing this parameter too much.)
`verb`	Verbosity. Set to `FALSE` to avoid messages being output to the console. (They will be in the output log file anyway).
`x`	Vector of dates. (To be read from the *.rda file.)

Details

First of all take into account that this is an experimental function, and will fail if there are time steps completely void of data in any sub-area.

If you have not decided the splitting meridians and parallels, do not set them, and the function will provide a map to help in selecting the areas.

If you set the xc and yc splitting borders, setting maponly=TRUE will also produce a map with the stations, plus the required overlapping areas, without doing any homogenization. In this way you can review the limits of the areas to choose new ones if you are not happy with the current partitioning.

All parameters except xc, yc, xo, yo and maponly are the same as in the homogen function, and will be passed to it to perform the homogenization.

If a rectangular area include less than 10 stations, these will be added to the next area. Warning! If it is the last area, they will not be processed, and the homogenization results will have inconsistent number of stations. In this case the user should try a new set of cutting limits.

One graphic output file will be produce for every area containing stations, but the rest of the output will be merged into single files.

Value

This function does not return any value.

Examples

#Set a temporal working directory and write input files:
wd <- tempdir()
wd0 <- setwd(wd)
data(Ptest)
dim(dat) <- c(720,20)
dat[601:720,5] <- dat[601:720,5]*1.8
write(dat[481:720,1:12],'pcp_1991-2010.dat')
write.table(est.c[1:12,1:5],'pcp_1991-2010.est',row.names=FALSE,col.names=FALSE)
#Now run the example:
homogsplit('pcp',1991,2010,2.9,39.6,0,0,std=2)
#Return to user's working directory:
setwd(wd0)
#Input and output files can be found in directory:
print(wd)
#
# Note that this is just a trivial example; this function is intended to
# homogenize a network of thousands of long series which might overload
# computer resources, and is still experimental.

climatol

Climate Tools (Series Homogenization and Derived Products)

v3.1.2

GPL (>= 2)

Authors

Jose A. Guijarro <jguijarrop@aemet.es>

Initial release

2019-08-05