Estimate Remaining Runtimes
Estimates the runtimes of jobs using the random forest implemented in ranger.
Observed runtimes are retrieved from the Registry
and runtimes are
predicted for unfinished jobs.
The estimated remaining time is calculated in the print
method.
You may also pass n
here to determine the number of parallel jobs which is then used
in a simple Longest Processing Time (LPT) algorithm to give an estimate for the parallel runtime.
estimateRuntimes(tab, ..., reg = getDefaultRegistry()) ## S3 method for class 'RuntimeEstimate' print(x, n = 1L, ...)
tab |
[ |
... |
[ANY] |
reg |
[ |
x |
[ |
n |
[ |
[RuntimeEstimate
] which is a list
with two named elements:
“runtimes” is a data.table
with columns “job.id”,
“runtime” (in seconds) and “type” (“estimated” if runtime is estimated,
“observed” if runtime was observed).
The other element of the list named “model”] contains the fitted random forest object.
# Create a simple toy registry set.seed(1) tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE, seed = 1) addProblem(name = "iris", data = iris, fun = function(data, ...) nrow(data), reg = tmp) addAlgorithm(name = "nrow", function(instance, ...) nrow(instance), reg = tmp) addAlgorithm(name = "ncol", function(instance, ...) ncol(instance), reg = tmp) addExperiments(algo.designs = list(nrow = data.table::CJ(x = 1:50, y = letters[1:5])), reg = tmp) addExperiments(algo.designs = list(ncol = data.table::CJ(x = 1:50, y = letters[1:5])), reg = tmp) # We use the job parameters to predict runtimes tab = unwrap(getJobPars(reg = tmp)) # First we need to submit some jobs so that the forest can train on some data. # Thus, we just sample some jobs from the registry while grouping by factor variables. library(data.table) ids = tab[, .SD[sample(nrow(.SD), 5)], by = c("problem", "algorithm", "y")] setkeyv(ids, "job.id") submitJobs(ids, reg = tmp) waitForJobs(reg = tmp) # We "simulate" some more realistic runtimes here to demonstrate the functionality: # - Algorithm "ncol" is 5 times more expensive than "nrow" # - x has no effect on the runtime # - If y is "a" or "b", the runtimes are really high runtime = function(algorithm, x, y) { ifelse(algorithm == "nrow", 100L, 500L) + 1000L * (y %in% letters[1:2]) } tmp$status[ids, done := done + tab[ids, runtime(algorithm, x, y)]] rjoin(sjoin(tab, ids), getJobStatus(ids, reg = tmp)[, c("job.id", "time.running")]) # Estimate runtimes: est = estimateRuntimes(tab, reg = tmp) print(est) rjoin(tab, est$runtimes) print(est, n = 10) # Submit jobs with longest runtime first: ids = est$runtimes[type == "estimated"][order(runtime, decreasing = TRUE)] print(ids) ## Not run: submitJobs(ids, reg = tmp) ## End(Not run) # Group jobs into chunks with runtime < 1h ids = est$runtimes[type == "estimated"] ids[, chunk := binpack(runtime, 3600)] print(ids) print(ids[, list(runtime = sum(runtime)), by = chunk]) ## Not run: submitJobs(ids, reg = tmp) ## End(Not run) # Group jobs into 10 chunks with similar runtime ids = est$runtimes[type == "estimated"] ids[, chunk := lpt(runtime, 10)] print(ids[, list(runtime = sum(runtime)), by = chunk])
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.