Enable multi-core parallel evaluation
This class is used to parameterize single computer multicore parallel
evaluation on non-Windows computers. multicoreWorkers()
chooses
the number of workers.
## constructor ## ------------------------------------ MulticoreParam(workers = multicoreWorkers(), tasks = 0L, stop.on.error = TRUE, progressbar = FALSE, RNGseed = NULL, timeout = 30L * 24L * 60L * 60L, exportglobals=TRUE, log = FALSE, threshold = "INFO", logdir = NA_character_, resultdir = NA_character_, jobname = "BPJOB", manager.hostname = NA_character_, manager.port = NA_integer_, ...) ## detect workers ## ------------------------------------ multicoreWorkers()
workers |
|
tasks |
In this documentation a job is defined as a single call to a function, such
as A When the length of |
stop.on.error |
|
progressbar |
|
RNGseed |
|
timeout |
|
exportglobals |
|
log |
|
threshold |
|
logdir |
|
resultdir |
|
jobname |
|
manager.hostname |
|
manager.port |
|
... |
Additional arguments passed to |
MulticoreParam
is used for shared memory computing. Under the hood
the cluster is created with makeCluster(..., type ="FORK")
from
the parallel
package.
The default number of workers is determined by
multicoreWorkers()
. On windows, the number of multicore workers
is always 1. Otherwise, the default is normally the maximum of 1 and
parallel::detectCores() - 2
. Machines with 3 or fewer cores,
or machines where number of cores cannot be determined, are
assigned a single worker. Machines with more than 127 cores are
limited to the number of R connections available when the
workers start; this is 128 (a hard-coded limit in R) minus the
number of open connections as returned by
nrow(showConnections(all=TRUE))
. The option mc.cores
can
be used to specify an arbitrary number of workers, e.g.,
options(mc.cores=4L)
; the Bioconductor build system
enforces a maximum of 4 workers.
A FORK transport starts workers with the mcfork
function and
communicates between master and workers using socket connections.
mcfork
builds on fork() and thus a Linux cluster is not supported.
Because FORK clusters are Posix based they are not supported on
Windows. When MulticoreParam
is created/used in Windows it
defaults to SerialParam
which is the equivalent of using a
single worker.
By default all computations are attempted and partial results are returned with any error messages.
stop.on.error
A logical
. Stops all jobs as soon
as one job fails or wait for all jobs to terminate. When
FALSE
, the return value is a list of successful results
along with error messages as 'conditions'.
The bpok(x)
function returns a logical()
vector
that is FALSE for any jobs that threw an error. The input
x
is a list output from a bp*apply function such as
bplapply
or bpmapply
.
When log = TRUE
the futile.logger
package is loaded on
the workers. All log messages written in the futile.logger
format
are captured by the logging mechanism and returned in real-time
(i.e., as each task completes) instead of after all jobs have finished.
Messages sent to stdout and stderr are returned to
the workspace by default. When log = TRUE
these
are diverted to the log output. Those familiar with the outfile
argument to makeCluster
can think of log = FALSE
as
equivalent to outfile = NULL
; providing a logdir
is the
same as providing a name for outfile
except that BiocParallel
writes a log file for each task.
The log output includes additional statistics such as memory use and task runtime. Memory use is computed by calling gc(reset=TRUE) before code evaluation and gc() (no reseet) after. The output of the second gc() call is sent to the log file. There are many ways to track memory use - this particular approach was taken because it is consistent with how the BatchJobs package reports memory on the workers.
Results and logs can be written to a file instead of returned to
the workspace. Writing to files is done from the master as each task
completes. Options can be set with the logdir
and
resultdir
fields in the constructor or with the accessors,
bplogdir
and bpresultdir
.
MulticoreParam
and SnowParam
use the random number
generation support from the parallel package. These params are
snow-derived clusters so the arguments for multicore-derived functions
such as mc.set.seed
and mc.reset.stream
do not apply.
Random number generation is controlled through the param argument,
RNGseed
which is passed to parallel::clusterSetRNGStream.
clusterSetRNGStream
uses the L'Ecuyer-CMRG random number
generator and distributes streams to the members of a cluster. If
RNGseed
is not NULL it serves as the seed to the streams,
otherwise the streams are set from the current seed of the master process
after selecting the L'Ecuyer generator. See ?clusterSetRNGStream
for more details.
MulticoreParam(workers = multicoreWorkers(), tasks = 0L,
stop.on.error = FALSE,
tasks = 0L, progressbar = FALSE, RNGseed = NULL,
timeout = Inf, exportglobals=TRUE,
log = FALSE, threshold = "INFO",
logdir = NA_character_, resultdir = NA_character_,
manager.hostname = NA_character_,
manager.port = NA_integer_, ...)
:
Return an object representing a FORK cluster. The cluster is not
created until bpstart
is called. Named arguments in ...
are passed to makeCluster
.
In the following code, x
is a MulticoreParam
object.
bpprogressbar(x)
, bpprogressbar(x) <- value
:
Get or set the value to enable text progress bar.
value
must be a logical(1)
.
bpjobname(x)
, bpjobname(x) <- value
:
Get or set the job name.
bpRNGseed(x)
, bpRNGseed(x) <- value
:
Get or set the seed for random number generaton. value
must be a
numeric(1)
or NULL
.
bplog(x)
, bplog(x) <- value
:
Get or set the value to enable logging. value
must be a
logical(1)
.
bpthreshold(x)
, bpthreshold(x) <- value
:
Get or set the logging threshold. value
must be a
character(1)
string of one of the levels defined in the
futile.logger
package: “TRACE”, “DEBUG”,
“INFO”, “WARN”, “ERROR”, or “FATAL”.
bplogdir(x)
, bplogdir(x) <- value
:
Get or set the directory for the log file. value
must be a
character(1)
path, not a file name. The file is written out as
LOGFILE.out. If no logdir
is provided and bplog=TRUE
log
messages are sent to stdout.
bpresultdir(x)
, bpresultdir(x) <- value
:
Get or set the directory for the result files. value
must be a
character(1)
path, not a file name. Separate files are written for
each job with the prefix JOB (e.g., JOB1, JOB2, etc.). When no
resultdir
is provided the results are returned to the session as
list
.
In the code below x
is a MulticoreParam
object. See the
?BiocParallelParam
man page for details on these accessors.
bpworkers(x)
bpnworkers(x)
bptasks(x)
, bptasks(x) <- value
bpstart(x)
bpstop(x)
bpisup(x)
bpbackend(x)
, bpbackend(x) <- value
In the code below x
is a MulticoreParam
object. See the
?BiocParallelParam
man page for details on these accessors.
bpstopOnError(x)
, bpstopOnError(x) <- value
In the code below BPPARAM
is a MulticoreParam
object.
Full documentation for these functions are on separate man pages: see
?bpmapply
, ?bplapply
, ?bpvec
, ?bpiterate
and
?bpaggregate
.
bpmapply(FUN, ..., MoreArgs=NULL, SIMPLIFY=TRUE,
USE.NAMES=TRUE, BPPARAM=bpparam())
bplapply(X, FUN, ..., BPPARAM=bpparam())
bpvec(X, FUN, ..., AGGREGATE=c, BPPARAM=bpparam())
bpiterate(ITER, FUN, ..., BPPARAM=bpparam())
bpaggregate(x, data, FUN, ..., BPPARAM=bpparam())
In the code below x
is a MulticoreParam
object.
show(x)
:
Displays the MulticoreParam
object.
See the 'Global Options' section of SnowParam
for
manager host name and port defaults.
Martin Morgan mailto:mtmorgan@fhcrc.org and Valerie Obenchain
register
for registering parameter classes for use in
parallel evaluation.
SnowParam
for computing in distributed memory
BatchJobsParam
for computing with cluster schedulers
DoparParam
for computing with foreach
SerialParam
for non-parallel evaluation
## ----------------------------------------------------------------------- ## Job configuration: ## ----------------------------------------------------------------------- ## MulticoreParam supports shared memory computing. The object fields ## control the division of tasks, error handling, logging and ## result format. bpparam <- MulticoreParam() bpparam ## By default the param is created with the maximum available workers ## determined by multicoreWorkers(). multicoreWorkers() ## Fields are modified with accessors of the same name: bplog(bpparam) <- TRUE dir.create(resultdir <- tempfile()) bpresultdir(bpparam) <- resultdir bpparam ## ----------------------------------------------------------------------- ## Logging: ## ----------------------------------------------------------------------- ## When 'log == TRUE' the workers use a custom script (in BiocParallel) ## that enables logging and access to other job statistics. Log messages ## are returned as each job completes rather than waiting for all to finish. ## In 'fun', a value of 'x = 1' will throw a warning, 'x = 2' is ok ## and 'x = 3' throws an error. Because 'x = 1' sleeps, the warning ## should return after the error. X <- 1:3 fun <- function(x) { if (x == 1) { Sys.sleep(2) if (TRUE & c(TRUE, TRUE)) ## warning x } else if (x == 2) { x ## ok } else if (x == 3) { sqrt("FOO") ## error } } ## By default logging is off. Turn it on with the bplog()<- setter ## or by specifying 'log = TRUE' in the constructor. bpparam <- MulticoreParam(3, log = TRUE, stop.on.error = FALSE) res <- tryCatch({ bplapply(X, fun, BPPARAM=bpparam) }, error=identity) res ## When a 'logdir' location is given the messages are redirected to a file: ## Not run: bplogdir(bpparam) <- tempdir() bplapply(X, fun, BPPARAM = bpparam) list.files(bplogdir(bpparam)) ## End(Not run) ## ----------------------------------------------------------------------- ## Managing results: ## ----------------------------------------------------------------------- ## By default results are returned as a list. When 'resultdir' is given ## files are saved in the directory specified by job, e.g., 'TASK1.Rda', ## 'TASK2.Rda', etc. ## Not run: dir.create(resultdir <- tempfile()) bpparam <- MulticoreParam(2, resultdir = resultdir, stop.on.error = FALSE) bplapply(X, fun, BPPARAM = bpparam) list.files(bpresultdir(bpparam)) ## End(Not run) ## ----------------------------------------------------------------------- ## Error handling: ## ----------------------------------------------------------------------- ## When 'stop.on.error' is TRUE the job is terminated as soon as an ## error is hit. When FALSE, all computations are attempted and partial ## results are returned along with errors. In this example the number of ## 'tasks' is set to equal the length of 'X' so each element is run ## separately. (Default behavior is to divide 'X' evenly over workers.) ## All results along with error: bpparam <- MulticoreParam(2, tasks = 4, stop.on.error = FALSE) res <- bptry(bplapply(list(1, "two", 3, 4), sqrt, BPPARAM = bpparam)) res ## Calling bpok() on the result list returns TRUE for elements with no error. bpok(res) ## ----------------------------------------------------------------------- ## Random number generation: ## ----------------------------------------------------------------------- ## Random number generation is controlled with the 'RNGseed' field. ## This seed is passed to parallel::clusterSetRNGStream ## which uses the L'Ecuyer-CMRG random number generator and distributes ## streams to members of the cluster. bpparam <- MulticoreParam(3, RNGseed = 7739465) bplapply(seq_len(bpnworkers(bpparam)), function(i) rnorm(1), BPPARAM = bpparam)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.