data.table: froll – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

data.table

froll

Rolling functions

Description

Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental.

Usage

frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
  "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left",
  "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollapply(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"))

Arguments

`x`	vector, list, data.frame or data.table of numeric or logical columns.
`n`	integer vector, for adaptive rolling function also list of integer vectors, rolling window size.
`fill`	numeric, value to pad by. Defaults to `NA`.
`algo`	character, default `"fast"`. When set to `"exact"`, then slower algorithm is used. It suffers less from floating point rounding error, performs extra pass to adjust rounding error correction and carefully handles all non-finite values. If available it will use multiple cores. See details for more information.
`align`	character, define if rolling window covers preceding rows (`"right"`), following rows (`"left"`) or centered (`"center"`). Defaults to `"right"`.
`na.rm`	logical. Should missing values be removed when calculating window? Defaults to `FALSE`. For details on handling other non-finite values, see details below.
`hasNA`	logical. If it is known that `x` contains `NA` then setting to `TRUE` will speed up. Defaults to `NA`.
`adaptive`	logical, should adaptive rolling function be calculated, default `FALSE`. See details below.
`FUN`	the function to be applied in rolling fashion; see Details for restrictions
`...`	extra arguments passed to `FUN` in `frollapply`.

Details

froll* functions accepts vectors, lists, data.frames or data.tables. They always return a list except when the input is a vector and length(n)==1 in which case a vector is returned, for convenience. Thus rolling functions can be used conveniently within data.table syntax.

Argument n allows multiple values to apply rolling functions on multiple window sizes. If adaptive=TRUE, then it expects a list. Each list element must be integer vector of window sizes corresponding to every single observation in each column.

When algo="fast" then on-line algorithm is used, also any NaN, +Inf, -Inf is treated as NA. Setting algo="exact" will make rolling functions to use compute-intensive algorithm that suffers less from floating point rounding error. It also handles NaN, +Inf, -Inf consistently to base R. In case of some functions (like mean), it will additionally make extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads.

Adaptive rolling functions are special cases where for each single observation has own corresponding rolling window width. Due to the logic of adaptive rolling functions, following restrictions apply:

align only "right".
if list of vectors is passed to x, then all list vectors must have equal length.

When multiple columns or multiple windows width are provided, then they are run in parallel. Except for the algo="exact" which runs in parallel already.

frollapply computes rolling aggregate on arbitrary R functions. The input x (first argument) to the function FUN is coerced to numeric beforehand and FUN has to return a scalar numeric value. Checks for that are made only during the first iteration when FUN is evaluated. Edge cases can be found in examples below. Any R function is supported, but it is not optimized using our own C implementation – hence, for example, using frollapply to compute a rolling average is inefficient. It is also always single-threaded because there is no thread-safe API to R's C eval. Nevertheless we've seen the computation speed up vis-a-vis versions implemented in base R.

Value

A list except when the input is a vector and length(n)==1 in which case a vector is returned.

Note

Users coming from most popular package for rolling functions zoo might expect following differences in data.table implementation.

rolling function will always return result of the same length as input.
fill defaults to NA.
fill accepts only constant values. It does not support for na.locf or other functions.
align defaults to "right".
na.rm is respected, and other functions are not needed when input contains NA.
integers and logical are always coerced to double.
when adaptive=FALSE (default), then n must be a numeric vector. List is not accepted.
when adaptive=TRUE, then n must be vector of length equal to nrow(x), or list of such vectors.
partial window feature is not supported, although it can be accomplished by using adaptive=TRUE, see examples.

Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, one has to ensure that there are no gaps in input. For details see issue #3241.

References

Round-off error

Examples

d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available

# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)

# frollsum
frollsum(d, 3:4)

# frollapply
frollapply(d, 3:4, sum)
f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...)
frollapply(d, 3:4, f, na.rm=TRUE)

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
  ans = rep(NA_real_, nx<-length(x))
  for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
  ans
}
fastma = function(x, n, na.rm) {
  if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
  cs = cumsum(x)
  scs = shift(cs, n)
  scs[n] = 0
  as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n))
system.time(ans4<-frollmean(x, n, algo="exact"))
system.time(ans5<-frollapply(x, n, mean))
anserr = list(
  fastma = ans2-ans1,
  froll_fast = ans3-ans1,
  froll_exact = ans4-ans1,
  frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff

# frollapply corner cases
f = function(x) head(x, 2)     ## FUN returns non length 1
try(frollapply(1:5, 3, f))
f = function(x) {              ## FUN sometimes returns non length 1
  n = length(x)
  # length 1 will be returned only for first iteration where we check length
  if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored!
}
frollapply(1:5, 3, f)
options(datatable.verbose=TRUE)
x = c(1,2,1,1,1,2,3,2)
frollapply(x, 3, uniqueN)     ## FUN returns integer
numUniqueN = function(x) as.numeric(uniqueN(x))
frollapply(x, 3, numUniqueN)
x = c(1,2,1,1,NA,2,NA,2)
frollapply(x, 3, anyNA)       ## FUN returns logical
as.logical(frollapply(x, 3, anyNA))
options(datatable.verbose=FALSE)
f = function(x) {             ## FUN returns character
  if (sum(x)>5) "big" else "small"
}
try(frollapply(1:5, 3, f))
f = function(x) {             ## FUN is not type-stable
  n = length(x)
  # double type will be returned only for first iteration where we check type
  if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double
}
try(frollapply(1:5, 3, f))

data.table

Extension of `data.frame`

v1.14.0

MPL-2.0 | file LICENSE

Authors

Matt Dowle [aut, cre], Arun Srinivasan [aut], Jan Gorecki [ctb], Michael Chirico [ctb], Pasha Stetsenko [ctb], Tom Short [ctb], Steve Lianoglou [ctb], Eduard Antonyan [ctb], Markus Bonsch [ctb], Hugh Parsonage [ctb], Scott Ritchie [ctb], Kun Ren [ctb], Xianying Tan [ctb], Rick Saporta [ctb], Otto Seiskari [ctb], Xianghui Dong [ctb], Michel Lang [ctb], Watal Iwasaki [ctb], Seth Wenchel [ctb], Karl Broman [ctb], Tobias Schmidt [ctb], David Arenburg [ctb], Ethan Smith [ctb], Francois Cocquemas [ctb], Matthieu Gomez [ctb], Philippe Chataignon [ctb], Nello Blaser [ctb], Dmitry Selivanov [ctb], Andrey Riabushenko [ctb], Cheng Lee [ctb], Declan Groves [ctb], Daniel Possenriede [ctb], Felipe Parages [ctb], Denes Toth [ctb], Mus Yaramaz-David [ctb], Ayappan Perumal [ctb], James Sams [ctb], Martin Morgan [ctb], Michael Quinn [ctb], @javrucebo [ctb], @marc-outins [ctb], Roy Storey [ctb], Manish Saraswat [ctb], Morgan Jacob [ctb], Michael Schubmehl [ctb], Davis Vaughan [ctb], Toby Hocking [ctb], Leonardo Silvestri [ctb], Tyson Barrett [ctb], Jim Hester [ctb], Anthony Damico [ctb], Sebastian Freundt [ctb], David Simons [ctb], Elliott Sales de Andrade [ctb], Cole Miller [ctb], Jens Peder Meldgaard [ctb], Vaclav Tlapak [ctb], Kevin Ushey [ctb], Dirk Eddelbuettel [ctb], Ben Schwen [ctb]

Initial release

froll

Description

Usage

Arguments

Details

Value

Note

References

See Also

Examples

data.table

We don't support your browser anymore