Apply homogen() on overlapping rectangular areas.
If the number of series is too big to be homogenized all at a time (normally several thousands, depending on the computer resources), this function can homogenize them by splitting the geographical domain in overlapping rectangular areas.
homogsplit(varcli, anyi, anyf, xc=NULL, yc=NULL, xo=.5, yo=.38, maponly=FALSE, suf=NA, nm=NA, nref=c(10,10,4), swa=NA, std=3, ndec=1, dz.max=5, dz.min=-dz.max, wd=c(0,0,100), snht1=25, snht2=snht1, tol=.02, maxdif=NA, mxdif=maxdif, force=FALSE, wz=.001, trf=0, mndat=NA, gp=3, ini=NA, na.strings="NA", maxite=999, vmin=NA, vmax=NA, nclust=100, grdcol=grey(.4), mapcol=grey(.4), hires=TRUE, expl=FALSE, metad=FALSE, sufbrk='m', tinc=NA, tz='UTC', cex=1.2, verb=TRUE, x=NA)
varcli |
Acronym of the name of the studied climatic variable, as in the data file name. |
anyi |
Initial year of the data present in the file. |
anyf |
Final year of the data present in the file. |
xc |
Vector of X axis coordinates setting the domain splitting meridians. |
yc |
Vector of Y axis coordinates setting the domain splitting parallels. |
xo |
Overlapping width in the East-West direction. |
yo |
Overlapping width in the North-South direction. |
maponly |
Do not homogenize. Only draw a map with stations locations and domain partitioning. |
suf |
Optional suffix appended with a '-' to the name of the variable in the input files. |
nm |
Number of data per year in each station. (Defaults to NA, and then it will be computed from the total number of data). |
nref |
Maximum number of references for data estimation. (Defaults to 10 in the detection stages, and to 4 in the final series adjustments). |
swa |
Size of the step forward to be applied to the staggered window application of SNHT. If not set (the default), 365 terms (one year) will be used for daily data, and 60 otherwise. |
std |
Type of normalization:
|
ndec |
Number of decimal digits to which the homogenized data must be rounded. |
dz.max |
Threshold of outlier tolerance, in standard deviations. (5 by default). |
dz.min |
Lower threshold of outlier tolerance if different from the higher one. (By default, they will be the same, with opposite signs). |
wd |
Distance (in km) at which reference data will weigh half of that
of another located at zero distance. (Defaults to |
snht1 |
Threshold value for the stepped SNHT window test applied in stage
1. (25 by default. No SNHT analysis will be performed if |
snht2 |
Threshold value for the SNHT test when applied to the complete series in stage 2 (same value as snht1 by default). |
tol |
Tolerance factor to split several series at a time. The default is 0.02, meaning that a 2% will be allowed for every reference data. (E.g.: if the maximum SNHT test value in a series is 30 and 10 references were used to compute the anomalies, the series will be split if the maximum test of the reference series is lower than 30*(1+0.02*10)=36. Set |
maxdif |
Maximum difference of any data item in consecutive iterations. If not set, defaults to half of the data precision (defined by the number of decimals). |
mxdif |
Old maxdif parameter (maintained for compatilibility). |
force |
Force break even when only one reference is available.
( |
wz |
Scale parameter of the vertical coordinate |
trf |
By default, data are not transformed ( |
mndat |
Minimum number of data for a split fragment to become a new
series. It defaults to half of the |
gp |
Graphic parameter:
|
ini |
Initial date, with format |
na.strings |
Character string to be treated as a missing value. (It can be a vector of strings, if more than one is needed). Defaults to 'NA', the standard missing data code in R. |
maxite |
Maximum number of iterations when computing the means of the series. (999 by default). |
vmin |
Minimum possible value (lower limit) of the studied variable.
Unset by default, but note that |
vmax |
Maximum possible value (upper limit) of the studied variable.
(E.g., for relative humidity or relative sunshine hours it is advisable to set
|
nclust |
Maximum number of stations for the cluster analysis. (If much greater than 100, the default value, the process may be too long and the graphic too dense). |
grdcol |
Color of the graphic background grids. (Gray by default.) |
mapcol |
Color of the background map. (Gray by default). |
hires |
By default, the background map will be drawn in high resolution. Set this parameter to |
expl |
Set this to |
metad |
Set this to |
sufbrk |
Suffix to add to |
tinc |
Time increment between data. Not set by default, but can be defined
for subdaily data, as in e.g.: |
tz |
Time zone. Only relevant for subdaily data. ( |
cex |
Character expansion factor for graphic labels and titles. (Defaults to 1.2. Note that if station names are long, they will not fit in titles when increasing this parameter too much.) |
verb |
Verbosity. Set to |
x |
Vector of dates. (To be read from the *.rda file.) |
First of all take into account that this is an experimental function, and will fail if there are time steps completely void of data in any sub-area.
If you have not decided the splitting meridians and parallels, do not set them, and the function will provide a map to help in selecting the areas.
If you set the xc
and yc
splitting borders, setting maponly=TRUE
will also produce a map with the stations, plus the required overlapping areas, without doing any homogenization. In this way you can review the limits of the areas to choose new ones if you are not happy with the current partitioning.
All parameters except xc
, yc
, xo
, yo
and maponly
are the same as in the homogen
function, and will be passed to it to perform the homogenization.
If a rectangular area include less than 10 stations, these will be added to the next area. Warning! If it is the last area, they will not be processed, and the homogenization results will have inconsistent number of stations. In this case the user should try a new set of cutting limits.
One graphic output file will be produce for every area containing stations, but the rest of the output will be merged into single files.
This function does not return any value.
#Set a temporal working directory and write input files: wd <- tempdir() wd0 <- setwd(wd) data(Ptest) dim(dat) <- c(720,20) dat[601:720,5] <- dat[601:720,5]*1.8 write(dat[481:720,1:12],'pcp_1991-2010.dat') write.table(est.c[1:12,1:5],'pcp_1991-2010.est',row.names=FALSE,col.names=FALSE) #Now run the example: homogsplit('pcp',1991,2010,2.9,39.6,0,0,std=2) #Return to user's working directory: setwd(wd0) #Input and output files can be found in directory: print(wd) # # Note that this is just a trivial example; this function is intended to # homogenize a network of thousands of long series which might overload # computer resources, and is still experimental.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.