Sorting: chunked ordering of integer suscript positions
Function ffindexorder
will calculate chunkwise the order positions to sort all positions in a chunk ascending.
Function ffindexordersize
does the calculation of the chunksize for ffindexorder
.
ffindexordersize(length, vmode, BATCHBYTES = getOption("ffmaxbytes")) ffindexorder(index, BATCHSIZE, FF_RETURN = NULL, VERBOSE = FALSE)
index |
A |
BATCHSIZE |
Limit for the chunksize (see details) |
BATCHBYTES |
Limit for the number of bytes per batch |
FF_RETURN |
Optionally an |
VERBOSE |
Logical scalar for activating verbosing. |
length |
Number of elements in the index |
vmode |
The |
Accessing integer positions in an ff vector is a non-trivial task, because it could easily lead to random-access to a disk file.
We avoid random access by loading batches of the subscript values into RAM, order them ascending, and only then access the ff values on disk.
Such an ordering can be done on-the-fly by ffindexget
or it can be created upfront with ffindexorder
, stored and re-used,
similar to storing and using hybrid index information with as.hi
.
Function ffindexorder
returns an ff integer vector with an attribute BATCHSIZE
(the chunksize finally used, not the one given with argument BATCHSIZE
).
Function ffindexordersize
returns a balanced batchsize as returned from bbatch
.
Jens Oehlschlägel
x <- ff(sample(40)) message("fforder requires sorting") i <- fforder(x) message("applying this order i is done by ffindexget") x[i] message("applying this order i requires random access, therefore ffindexget does chunkwise sorting") ffindexget(x, i) message("if we want to apply the order i multiple times, we can do the chunkwise sorting once and store it") s <- ffindexordersize(length(i), vmode(i), BATCHBYTES = 100) o <- ffindexorder(i, s$b) message("this is how the stored chunkwise sorting is used") ffindexget(x, i, o) message("") rm(x,i,s,o) gc()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.