Reading and writing ffdf data.frame using ff subscripts
Function ffdfindexget
allows to extract rows from an ffdf data.frame according to positive integer suscripts stored in an ff vector.
Function ffdfindexset
allows the inverse operation: assigning to rows of an ffdf data.frame according to positive integer suscripts stored in an ff vector.
These functions allow more control than the method dispatch of [
and [<-
if an ff integer subscript is used.
ffdfindexget(x, index, indexorder = NULL, autoindexorder = 3, FF_RETURN = NULL , BATCHSIZE = NULL, BATCHBYTES = getOption("ffmaxbytes"), VERBOSE = FALSE) ffdfindexset(x, index, value, indexorder = NULL, autoindexorder = 3 , BATCHSIZE = NULL, BATCHBYTES = getOption("ffmaxbytes"), VERBOSE = FALSE)
x |
A |
index |
A |
value |
A |
indexorder |
Optionally the return value of |
autoindexorder |
The minimum number of columns (which need chunked indexordering) for which we switch from on-the-fly ordering to stored |
FF_RETURN |
Optionally an |
BATCHSIZE |
Optinal limit for the batchsize (see details) |
BATCHBYTES |
Limit for the number of bytes per batch |
VERBOSE |
Logical scalar for verbosing |
Accessing rows of an ffdf data.frame identified by integer positions in an ff vector is a non-trivial task, because it could easily lead to random-access to disk files.
We avoid random access by loading batches of the subscript values into RAM, order them ascending, and only then access the ff values on disk.
Such ordering is don on-thy-fly for upto autoindexorder-1
columns that need ordering.
For autoindexorder
o more columns we do the batched ordering upfront with ffindexorder
and then re-use it in each call to ffindexget
resp. ffindexset
.
Function ffdfindexget
returns a ffdf data.frame with those rows selected by the ff index
vector.
Function ffdfindexset
returns x
with those rows replaced that had been requested by index
and value
.
Jens Oehlschlägel
message("ff integer subscripts with ffdf return/assign values") x <- ff(factor(letters)) y <- ff(1:26) d <- ffdf(x,y) i <- ff(2:9) di <- d[i,] di d[i,] <- di message("ff integer subscripts: more control with ffindexget/ffindexset") di <- ffdfindexget(d, i, FF_RETURN=di) d <- ffdfindexset(d, i, di) rm(x, y, d, i, di) gc()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.