Function to split data in different ways for conditioning
Utility function to split data frames up in various ways for
conditioning plots. Users would generally not be expected to call
this function directly. Widely used by many openair
functions usually through the option type
.
cutData( x, type = "default", hemisphere = "northern", n.levels = 4, start.day = 1, is.axis = FALSE, local.tz = NULL, latitude = 51, longitude = -0.5, ... )
x |
A data frame containing a field |
type |
A string giving the way in which the data frame should be split. Pre-defined values are: “default”, “year”, “hour”, “month”, “season”, “weekday”, “site”, “weekend”, “monthyear”, “daylight”, “dst” (daylight saving time).
|
hemisphere |
Can be |
n.levels |
Number of quantiles to split numeric data into. |
start.day |
What day of the week should the |
is.axis |
A logical ( |
local.tz |
Used for identifying whether a date has daylight
savings time (DST) applied or not. Examples include
|
latitude |
The decimal latitude used in |
longitude |
The decimal longitude. Note that locations west of Greenwich are negative. |
... |
All additional parameters are passed on to next function(s). |
This section give a brief description of each of the define levels
of type
. Note that all time dependent types require a
column date
.
"default" does not split the data but will describe the levels as a date range in the format "day month year".
"year" splits the data by each year.
"month" splits the data by month of the year.
"hour" splits the data by hour of the day.
"monthyear" splits the data by year and month. It differs from month in that a level is defined for each month of the data set. This is useful sometimes to show an ordered sequence of months if the data set starts half way through a year; rather than starting in January.
"weekend" splits the data by weekday and weekend.
"weekday" splits the data by day of the week - ordered to start Monday.
"season" splits data up by season. In the northern hemisphere
winter = December, January, February; spring = March, April, May
etc. These defintions will change of hemisphere =
"southern"
.
"seasonyear (or "yearseason") will split the data into year-season
intervals, keeping the months of a season together. For example,
December 2010 is considered as part of winter 2011 (with January
and February 2011). This makes it easier to consider contiguous
seasons. In contrast, type = "season"
will just split the
data into four seasons regardless of the year.
"daylight" splits the data relative to estimated sunrise and
sunset to give either daylight or nighttime. The cut is made by
cutDaylight
but more conveniently accessed via
cutData
, e.g. cutData(mydata, type = "daylight",
latitude = my.latitude, longitude = my.longitude)
. The daylight
estimation, which is valid for dates between 1901 and 2099, is
made using the measurement location, date, time and astronomical
algorithms to estimate the relative positions of the Sun and the
measurement location on the Earth's surface, and is based on NOAA
methods.
Measurement location should be
set using latitude
(+ to North; - to South) and
longitude
(+ to East; - to West).
"dst" will split the data by hours that are in daylight saving
time (DST) and hours that are not for appropriate time zones. The
option "dst" also requires that the local time zone is given
e.g. local.tz = "Europe/London"
, local.tz =
"America/New_York"
. Each of the two periods will be in
local time. The main purpose of this option is to test
whether there is a shift in the diurnal profile when DST and
non-DST hours are compared. This option is particularly useful
with the timeVariation
function. For example, close to the
source of road vehicle emissions, ‘rush-hour’ will tend to occur
at the same local time throughout the year e.g. 8 am and 5
pm. Therefore, comparing non-DST hours with DST hours will tend to
show similar diurnal patterns (at least in the timing of the
peaks, if not magnitude) when expressed in local time. By
contrast a variable such as wind speed or temperature should show
a clear shift when expressed in local time. In essence, this
option when used with timeVariation
may help determine
whether the variation in a pollutant is driven by man-made
emissions or natural processes.
"wd" splits the data by 8 wind sectors and requires a column
wd
: "NE", "E", "SE", "S", "SW", "W", "NW", "N".
"ws" splits the data by 8 quantiles of wind speed and requires a
column ws
.
"site" splits the data by site and therefore requires a column
site
.
Note that all the date-based types e.g. month/year are derived
from a column date
. If a user already has a column with a
name of one of the date-based types it will not be used.
Returns a data frame with a column cond
that is
defined by type
.
David Carslaw (cutData) and Karl Ropkins (cutDaylight)
## split data by day of the week mydata <- cutData(mydata, type = "weekday")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.