Reading and writing vectors and arrays (high-level)
These are the main methods for reading and writing data from ff files.
## S3 method for class 'ff' x[i, pack = FALSE] ## S3 replacement method for class 'ff' x[i, add = FALSE, pack = FALSE] <- value ## S3 method for class 'ff_array' x[..., bydim = NULL, drop = getOption("ffdrop"), pack = FALSE] ## S3 replacement method for class 'ff_array' x[..., bydim = NULL, add = FALSE, pack = FALSE] <- value ## S3 method for class 'ff' x[[i]] ## S3 replacement method for class 'ff' x[[i, add = FALSE]] <- value
x |
an ff object |
i |
missing OR a single index expression OR a |
... |
missing OR up to length(dim(x)) index expressions OR |
drop |
logical scalar indicating whether array dimensions shall be dropped |
bydim |
the dimorder which shall be used in interpreting vector to/from array data |
pack |
FALSE to prevent rle-packing in hybrid index preprocessing, see |
value |
the values to be assigned, possibly recycled |
add |
TRUE if the values should rather increment than overwrite at the target positions, see |
The single square bracket operators [
and [<-
are the workhorses for accessing the content of an ff object.
They support ff_vector
and ff_array
access (dim.ff
), they respect virtual windows (vw
),
names.ff
and dimnames.ff
and retain ramclass
and ramattribs
and thus support POSIXct
and factor
, see levels.ff
.
The functionality of [
and [<-
cn be combined into one efficient operation, see swap
.
The double square bracket operator [[
is a shortcut for get.ff
resp. set.ff
, however, you should not rely on this for the future, see LimWarn
. For programming please prefer [
.
The read operators [
and [[
return data from the ff object,
possibly decorated with names
, dim
,
dimnames
and further attributes and classes (see ramclass
, ramattribs
)
The write operators [<-
and [[<-
return the 'modified' ff object (like all assignment operators do).
x <- ff(1:12, dim=c(3,4), dimnames=list(letters[1:3], NULL))
allowed expression | -- | example |
positive integers | x[ 1 ,1] |
|
negative integers | x[ -(2:12) ] |
|
logical | x[ c(TRUE, FALSE, FALSE) ,1] |
|
character | x[ "a" ,1] |
|
integer matrices | x[ rbind(c(1,1)) ] |
|
hybrid index | x[ hi ,1] |
|
disallowed expression | -- | example |
zeros | x[ 0 ] |
|
NAs | x[ NA ] |
|
Arrays in R have always standard dimorder seq_along(dim(x))
while ff allows to store an array in a different dimorder.
Using nonstandard dimorder (see dimorderStandard
) can speed up certain access operations: while matrix dimorder=c(1,2)
– column-major order –
allows fast extraction of columns, dimorder=c(2,1)
allows fast extraction of rows.
While the dimorder – being an attribute of an ff_array
– controls how the vector in an ff file is interpreted,
the bydim
argument to the extractor functions controls, how assigment vector values
in [<-
are translated to the array and how the array is translated to a vector in [
subscripting.
Note that bydim=c(2,1)
corresponds to matrix(..., byrow=TRUE)
.
In case of non-standard dimorder (see dimorderStandard
)
the vector sequence of array elements in R and in the ff file differs.
To access array elements in file order, you can use getset.ff
, readwrite.ff
or copy the ff object and set dim(ff)<-NULL
to get a vector view into the ff object
(using [
dispatches the vector method [.ff
).
To access the array elements in R standard dimorder you simply use [
which dispatches
to [.ff_array
. Note that in this case as.hi
will unpack the complete index, see next section.
Some index expressions do not consume RAM due to the hi
representation,
for example 1:n
will almost consume no RAM hoewever large n.
However, some index expressions are expanded and require to maxindex(i) * .rambytes["integer"]
bytes,
either because the sorted sequence of index positions cannot be rle-packed efficiently
or because hiparse
cannot yet parse such expression and falls back to evaluating/expanding the index expression.
If the index positions are not sorted, the index will be expanded and a second vector is needed to store the information for re-ordering,
thus the index requires 2 * maxindex(i) * .rambytes["integer"]
bytes.
Some assignment expressions do not consume RAM for recycling, for example x[1:n] <- 1:k
will not consume RAM hoewever large n compared to k, when x has standard dimorder
.
However, if length(value)>1
, assignment expressions with non-ascending index positions trigger recycling the value R-side to the full index length.
This will happen if dimorder
does not match parameter bydim
or if the index is not sorted ascending.
Jens Oehlschlägel
ff
, swap
, add
, readwrite.ff
, LimWarn
message("look at different dimorders") x <- ff(1:12, dim=c(3,4), dimorder=c(1,2)) x[] as.vector(x[]) x[1:12] x <- ff(1:12, dim=c(3,4), dimorder=c(2,1)) x[] as.vector(x[]) message("Beware (might be changed)") x[1:12] message("look at different bydim") matrix(1:12, nrow=3, ncol=4, byrow=FALSE) x <- ff(1:12, dim=c(3,4), bydim=c(1,2)) x matrix(1:12, nrow=3, ncol=4, byrow=TRUE) x <- ff(1:12, dim=c(3,4), bydim=c(2,1)) x x[,, bydim=c(2,1)] as.vector(x[,, bydim=c(2,1)]) message("even consistent interpretation of vectors in assignments") x[,, bydim=c(1,2)] <- x[,, bydim=c(1,2)] x x[,, bydim=c(2,1)] <- x[,, bydim=c(2,1)] x rm(x); gc() ## Not run: message("some performance implications of different dimorders") n <- 100 m <- 100000 a <- ff(1L,dim=c(n,m)) b <- ff(1L,dim=c(n,m), dimorder=2:1) system.time(lapply(1:n, function(i)sum(a[i,]))) system.time(lapply(1:n, function(i)sum(b[i,]))) system.time(lapply(1:n, function(i){i<-(i-1)*(m/n)+1; sum(a[,i:(i+m/n-1)])})) system.time(lapply(1:n, function(i){i<-(i-1)*(m/n)+1; sum(b[,i:(i+m/n-1)])})) n <- 100 a <- ff(1L,dim=c(n,n,n,n)) b <- ff(1L,dim=c(n,n,n,n), dimorder=4:1) system.time(lapply(1:n, function(i)sum(a[i,,,]))) system.time(lapply(1:n, function(i)sum(a[,i,,]))) system.time(lapply(1:n, function(i)sum(a[,,i,]))) system.time(lapply(1:n, function(i)sum(a[,,,i]))) system.time(lapply(1:n, function(i)sum(b[i,,,]))) system.time(lapply(1:n, function(i)sum(b[,i,,]))) system.time(lapply(1:n, function(i)sum(b[,,i,]))) system.time(lapply(1:n, function(i)sum(b[,,,i]))) n <- 100 m <- 100000 a <- ff(1L,dim=c(n,m)) b <- ff(1L,dim=c(n,m), dimorder=2:1) system.time(ffrowapply(sum(a[i1:i2,]), a, RETURN=TRUE, CFUN="csum", BATCHBYTES=16104816%/%20)) system.time(ffcolapply(sum(a[,i1:i2]), a, RETURN=TRUE, CFUN="csum", BATCHBYTES=16104816%/%20)) system.time(ffrowapply(sum(b[i1:i2,]), b, RETURN=TRUE, CFUN="csum", BATCHBYTES=16104816%/%20)) system.time(ffcolapply(sum(b[,i1:i2]), b, RETURN=TRUE, CFUN="csum", BATCHBYTES=16104816%/%20)) rm(a,b); gc() ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.