Apply a Function over a List or Vector via Futures
future_lapply()
implements base::lapply()
using futures with perfect
replication of results, regardless of future backend used.
Analogously, this is true for all the other future_nnn()
functions.
future_eapply( env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE, future.label = "future_eapply-%d" ) future_lapply( X, FUN, ..., future.stdout = TRUE, future.conditions = "condition", future.globals = TRUE, future.packages = NULL, future.lazy = FALSE, future.seed = FALSE, future.scheduling = 1, future.chunk.size = NULL, future.label = "future_lapply-%d" ) future_replicate( n, expr, simplify = "array", future.seed = TRUE, ..., future.label = "future_replicate-%d" ) future_sapply( X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE, future.label = "future_sapply-%d" ) future_tapply( X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE, future.label = "future_tapply-%d" ) future_vapply( X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE, future.label = "future_vapply-%d" )
env |
An R environment. |
FUN |
A function taking at least one argument. |
all.names |
If |
USE.NAMES |
See |
future.label |
If a character string, then each future is assigned
a label |
X |
A vector-like object to iterate over. |
future.stdout |
If |
future.conditions |
A character string of conditions classes to be
captured and relayed. The default is the same as the |
future.globals |
A logical, a character vector, or a named list for controlling how globals are handled. For details, see below section. |
future.packages |
(optional) a character vector specifying packages to be attached in the R environment evaluating the future. |
future.lazy |
Specifies whether the futures should be resolved lazily or eagerly (default). |
future.seed |
A logical or an integer (of length one or seven),
or a list of |
future.scheduling |
Average number of futures ("chunks") per worker.
If |
future.chunk.size |
The average number of elements per future ("chunk").
If |
n |
The number of replicates. |
expr |
An R expression to evaluate repeatedly. |
simplify |
See |
INDEX |
A list of one or more factors, each of same length as |
default |
See |
FUN.VALUE |
A template for the required return value from
each |
... |
(optional) Additional arguments passed to |
A named (unless USE.NAMES = FALSE
) list.
See base::eapply()
for details.
For future_lapply()
, a list with same length and names as X
.
See base::lapply()
for details.
future_replicate()
is a wrapper around future_sapply()
and return
simplified object according to the simplify
argument.
See base::replicate()
for details.
Since future_replicate()
usually involves random number generation (RNG),
it uses future.seed = TRUE
by default in order produce sound random
numbers regardless of future backend and number of background workers used.
For future_sapply()
, a vector with same length and names as X
.
See base::sapply()
for details.
future_tapply()
returns an array with mode "list"
, unless
simplify = TRUE
(default) and FUN
returns a scalar, in which
case the mode of the array is the same as the returned scalars.
See base::tapply()
for details.
For future_vapply()
, a vector with same length and names as X
.
See base::vapply()
for details.
Argument future.globals
may be used to control how globals
should be handled similarly how the globals
argument is used with
future()
.
Since all function calls use the same set of globals, this function can do
any gathering of globals upfront (once), which is more efficient than if
it would be done for each future independently.
If TRUE
, NULL
or not is specified (default), then globals
are automatically identified and gathered.
If a character vector of names is specified, then those globals are gathered.
If a named list, then those globals are used as is.
In all cases, FUN
and any \ldots
arguments are automatically
passed as globals to each future created as they are always needed.
Unless future.seed = FALSE
, this function guarantees to generate
the exact same sequence of random numbers given the same initial
seed / RNG state - this regardless of type of futures, scheduling
("chunking") strategy, and number of workers.
RNG reproducibility is achieved by pregenerating the random seeds for all
iterations (over X
) by using L'Ecuyer-CMRG RNG streams. In each
iteration, these seeds are set before calling FUN(X[[ii]], ...)
.
Note, for large length(X)
this may introduce a large overhead.
As input (future.seed
), a fixed seed (integer) may be given, either
as a full L'Ecuyer-CMRG RNG seed (vector of 1+6 integers) or as a seed
generating such a full L'Ecuyer-CMRG seed.
If future.seed = TRUE
, then .Random.seed
is returned if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created
randomly.
If future.seed = NA
, a L'Ecuyer-CMRG RNG seed is randomly created.
If none of the function calls FUN(X[[ii]], ...)
uses random number
generation, then future.seed = FALSE
may be used.
In addition to the above, it is possible to specify a pre-generated
sequence of RNG seeds as a list such that
length(future.seed) == length(X)
and where each element is an
integer seed vector that can be assigned to
.Random.seed
. One approach to generate a
set of valid RNG seeds based on fixed initial seed (here 42L
) is:
seeds <- future_lapply(seq_along(X), FUN = function(x) .Random.seed, future.chunk.size = Inf, future.seed = 42L)
Note that as.list(seq_along(X))
is not a valid set of such
.Random.seed
values.
In all cases but future.seed = FALSE
, the RNG state of the calling
R processes after this function returns is guaranteed to be
"forwarded one step" from the RNG state that was before the call and
in the same way regardless of future.seed
, future.scheduling
and future strategy used. This is done in order to guarantee that an R
script calling future_lapply()
multiple times should be numerically
reproducible given the same initial seed.
Attribute ordering
of future.chunk.size
or future.scheduling
can
be used to control the ordering the elements are iterated over, which
only affects the processing order and not the order values are returned.
This attribute can take the following values:
index vector - an numeric vector of length length(X)
function - an function taking one argument which is called as
ordering(length(X))
and which much return an
index vector of length length(X)
, e.g.
function(n) rev(seq_len(n))
for reverse ordering.
"random"
- this will randomize the ordering via random index
vector sample.int(length(X))
.
For example, future.scheduling = structure(TRUE, ordering = "random")
.
Note, when elements are processed out of order, then captured standard
output and conditions are also relayed in that order, that is out of order.
The implementations of future_replicate()
, future_sapply()
, and
future_tapply()
are adopted from the source code of the corresponding
base R functions, which are licensed under GPL (>= 2) with
'The R Core Team' as the copyright holder.
## --------------------------------------------------------- ## lapply(), sapply(), tapply() ## --------------------------------------------------------- x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE, FALSE, FALSE, TRUE)) y0 <- lapply(x, FUN = quantile, probs = 1:3/4) y1 <- future_lapply(x, FUN = quantile, probs = 1:3/4) print(y1) stopifnot(all.equal(y1, y0)) y0 <- sapply(x, FUN = quantile) y1 <- future_sapply(x, FUN = quantile) print(y1) stopifnot(all.equal(y1, y0)) y0 <- vapply(x, FUN = quantile, FUN.VALUE = double(5L)) y1 <- future_vapply(x, FUN = quantile, FUN.VALUE = double(5L)) print(y1) stopifnot(all.equal(y1, y0)) ## --------------------------------------------------------- ## Parallel Random Number Generation ## --------------------------------------------------------- ## Regardless of the future plan, the number of workers, and ## where they are, the random numbers produced are identical plan(multisession) y1 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF) str(y1) plan(sequential) y2 <- future_lapply(1:5, FUN = rnorm, future.seed = 0xBEEF) str(y2) stopifnot(all.equal(y1, y2)) ## --------------------------------------------------------- ## Process chunks of data.frame rows in parallel ## --------------------------------------------------------- iris <- datasets::iris chunks <- split(iris, seq(1, nrow(iris), length.out = 3L)) y0 <- lapply(chunks, FUN = function(iris) sum(iris$Sepal.Length)) y0 <- do.call(sum, y0) y1 <- future_lapply(chunks, FUN = function(iris) sum(iris$Sepal.Length)) y1 <- do.call(sum, y1) print(y1) stopifnot(all.equal(y1, y0))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.