Specialized sliding functions
These functions are specialized variants of the most common ways that
slide()
is generally used. Notably, slide_sum()
can be used for
rolling sums, and slide_mean()
can be used for rolling averages.
These specialized variants are much faster and more memory efficient
than using an otherwise equivalent call constructed with slide_dbl()
or slide_lgl()
, especially with a very wide window.
slide_sum( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_prod( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_mean( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_min( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_max( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_all( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE ) slide_any( x, ..., before = 0L, after = 0L, step = 1L, complete = FALSE, na_rm = FALSE )
x |
A vector to compute the sliding function on.
|
... |
These dots are for future extensions and must be empty. |
before |
The number of values before or after the current element to
include in the sliding window. Set to |
after |
The number of values before or after the current element to
include in the sliding window. Set to |
step |
The number of elements to shift the window forward between function calls. |
complete |
Should the function be evaluated on complete windows only? If |
na_rm |
Should missing values be removed from the computation? |
Note that these functions are not generic and do not respect method
dispatch of the corresponding summary function (i.e. base::sum()
,
base::mean()
). Input will always be cast to a double or logical vector
using vctrs::vec_cast()
, and an internal method for computing the summary
function will be used.
Due to the structure of segment trees, slide_mean()
does not perform the
same "two pass" mean that mean()
does (the intention of the second pass is
to perform a floating point error correction). Because of this, there may be
small differences between slide_mean(x)
and slide_dbl(x, mean)
in some
cases.
A vector the same size as x
containing the result of applying the
summary function over the sliding windows.
For sliding sum, mean, prod, min, and max, a double vector will be returned.
For sliding any and all, a logical vector will be returned.
These variants are implemented using a data structure known as a segment tree, which allows for extremely fast repeated range queries without loss of precision.
One alternative to segment trees is to directly recompute the summary
function on each full window. This is what is done by using, for example,
slide_dbl(x, sum)
. This is extremely slow with large window sizes and
wastes a lot of effort recomputing nearly the same information on each
window. It can be made slightly faster by moving the sum to C to avoid
intermediate allocations, but it still fairly slow.
A second alternative is to use an online algorithm, which uses information from the previous window to compute the next window. These are extremely fast, only requiring a single pass through the data, but often suffer from numerical instability issues.
Segment trees are an attempt to reconcile the performance issues of the direct approach with the numerical issues of the online approach. The performance of segment trees isn't quite as fast as online algorithms, but is close enough that it should be usable on most large data sets without any issues. Unlike online algorithms, segment trees don't suffer from any extra numerical instability issues.
Leis, Kundhikanjana, Kemper, and Neumann (2015). "Efficient Processing of Window Functions in Analytical SQL Queries". https://dl.acm.org/doi/10.14778/2794367.2794375
x <- c(1, 5, 3, 2, 6, 10) # `slide_sum()` can be used for rolling sums. # The following are equivalent, but `slide_sum()` is much faster. slide_sum(x, before = 2) slide_dbl(x, sum, .before = 2) # `slide_mean()` can be used for rolling averages slide_mean(x, before = 2) # Only evaluate the sum on complete windows slide_sum(x, before = 2, after = 1, complete = TRUE) # Skip every other calculation slide_sum(x, before = 2, step = 2)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.