Chunk Jobs for Sequential Execution
Jobs can be partitioned into “chunks” to be executed sequentially on the computational nodes.
Chunks are defined by providing a data frame with columns “job.id” and “chunk” (integer)
to submitJobs
.
All jobs with the same chunk number will be grouped together on one node to form a single
computational job.
The function chunk
simply splits x
into either a fixed number of groups, or
into a variable number of groups with a fixed number of maximum elements.
The function lpt
also groups x
into a fixed number of chunks,
but uses the actual values of x
in a greedy “Longest Processing Time” algorithm.
As a result, the maximum sum of elements in minimized.
binpack
splits x
into a variable number of groups whose sum of elements do
not exceed the upper limit provided by chunk.size
.
See examples of estimateRuntimes
for an application of binpack
and lpt
.
chunk(x, n.chunks = NULL, chunk.size = NULL, shuffle = TRUE) lpt(x, n.chunks = 1L) binpack(x, chunk.size = max(x))
x |
[ |
n.chunks |
[ |
chunk.size |
[ |
shuffle |
[ |
[integer
] giving the chunk number for each element of x
.
ch = chunk(1:10, n.chunks = 2) table(ch) ch = chunk(rep(1, 10), chunk.size = 2) table(ch) set.seed(1) x = runif(10) ch = lpt(x, n.chunks = 2) sapply(split(x, ch), sum) set.seed(1) x = runif(10) ch = binpack(x, 1) sapply(split(x, ch), sum) # Job chunking tmp = makeRegistry(file.dir = NA, make.default = FALSE) ids = batchMap(identity, 1:25, reg = tmp) ### Group into chunks with 10 jobs each library(data.table) ids[, chunk := chunk(job.id, chunk.size = 10)] print(ids[, .N, by = chunk]) ### Group into 4 chunks ids[, chunk := chunk(job.id, n.chunks = 4)] print(ids[, .N, by = chunk]) ### Submit to batch system submitJobs(ids = ids, reg = tmp) # Grouped chunking tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE) prob = addProblem(reg = tmp, "prob1", data = iris, fun = function(job, data) nrow(data)) prob = addProblem(reg = tmp, "prob2", data = Titanic, fun = function(job, data) nrow(data)) algo = addAlgorithm(reg = tmp, "algo", fun = function(job, data, instance, i, ...) problem) prob.designs = list(prob1 = data.table(), prob2 = data.table(x = 1:2)) algo.designs = list(algo = data.table(i = 1:3)) addExperiments(prob.designs, algo.designs, repls = 3, reg = tmp) ### Group into chunks of 5 jobs, but do not put multiple problems into the same chunk # -> only one problem has to be loaded per chunk, and only once because it is cached ids = getJobTable(reg = tmp)[, .(job.id, problem, algorithm)] ids[, chunk := chunk(job.id, chunk.size = 5), by = "problem"] ids[, chunk := .GRP, by = c("problem", "chunk")] dcast(ids, chunk ~ problem)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.