ff class for data.frames
Function 'ffdf' creates ff data.frames stored on disk very similar to 'data.frame'
ffdf(... , row.names = NULL , ff_split = NULL , ff_join = NULL , ff_args = NULL , update = TRUE , BATCHSIZE = .Machine$integer.max , BATCHBYTES = getOption("ffbatchbytes") , VERBOSE = FALSE)
... |
|
row.names |
A |
ff_split |
A vector of character names or integer positions identifying input components to physically split into single ff_vectors. If vector elements have names, these are used as root name for the new ff files. |
ff_join |
A list of vectors with character names or integer positions identifying input components to physically join in the same ff matrix. If list elements have names, these are used to name the new ff files. |
update |
By default (TRUE) new ff files are updated with content of input ff objects. Setting to FALSE prevents this update. |
ff_args |
a list with further arguments passed to |
BATCHSIZE |
passed to |
BATCHBYTES |
passed to |
VERBOSE |
passed to |
By default, creating an 'ffdf' object will NOT create new ff files, instead existing files are referenced.
This differs from data.frame
, which always creates copies of the input objects,
most notably in data.frame(matrix())
, where an input matrix is converted to single columns.
ffdf by contrast, will store an input matrix physically as the same matrix and virtually map it to columns.
Physically copying a large ff matrix to single ff vectors can be expensive.
More generally, ffdf objects have a physical
and a virtual
component,
which allows very flexible dataframe designs: a physically stored matrix can be virtually mapped to single columns,
a couple of physically stored vectors can be virtually mapped to a single matrix.
The means to configure these are I
for the virtual representation and the 'ff_split' and 'ff_join'
arguments for the physical representation. An ff matrix wrapped into 'I()' will return the input matrix as a single object,
using 'ff_split' will store this matrix as single vectors - and thus create new ff files.
'ff_join' will copy a couple of input vectors into a unified new ff matrix with dimorder=c(2,1)
,
but virtually they will remain single columns. The returned ffdf object has also a dimorder
attribute,
which indicates whether the ffdf object contains a matrix with non-standard dimorder c(2,1)
, see dimorderStandard
.
Currently, virtual windows
are not supported for ffdf.
A list with components
and class 'ffdf' (NOTE that ffdf dows not inherit from ff)
The following methods and functions are available for ffdf objects:
Type | Name | Assign | Comment |
Basic functions | |||
function | ffdf |
constructor for ffdf objects | |
generic | update |
updates one ffdf object with the content of another | |
generic | clone |
clones an ffdf object | |
method | print |
print ffdf | |
method | str |
ffdf object structure | |
Class test and coercion | |||
function | is.ffdf |
check if inherits from ff | |
generic | as.ffdf |
coerce to ff, if not yet | |
generic | as.data.frame |
coerce to ram data.frame | |
Virtual storage mode | |||
generic | vmode |
get virtual modes for all (virtual) columns | |
Physical attributes | |||
function | physical |
get physical attributes | |
Virtual attributes | |||
function | virtual |
get virtual attributes | |
method | length |
get length | |
method | dim |
<- |
get dim and set nrow |
generic | dimorder |
get the dimorder (non-standard if any component is non-standard) | |
method | names |
<- |
set and get names |
method | row.names |
<- |
set and get row.names |
method | dimnames |
<- |
set and get dimnames |
method | pattern |
<- |
set pattern (rename/move files) |
Access functions | |||
method | [ |
<- | set and get data.frame content ([,] ) or get ffdf with less columns ([] ) |
method | [[ |
<- | set and get single column ff object |
method | $ |
<- | set and get single column ff object |
Opening/Closing/Deleting | |||
generic | is.open |
tri-bool is.open status of the physical ff components | |
method | open |
open all physical ff objects (is done automatically on access) | |
method | close |
close all physical ff objects | |
method | delete |
deletes all physical ff files | |
method | finalize |
call finalizer | |
processing | |||
method | chunk |
create chunked index | |
method | sortLevels |
sort and recode levels | |
Other | |||
Note that in theory, accessing a chunk of rows from a matrix with dimorder=c(2,1)
should be faster than accessing across a bunch of vectors.
However, at least under windows, the OS has difficulties filecaching parts from very large files, therefore - until we have partitioning - the recommended physical storage is in single vectors.
Jens Oehlschlägel
data.frame
, ff
, for more example see physical
m <- matrix(1:12, 3, 4, dimnames=list(c("r1","r2","r3"), c("m1","m2","m3","m4"))) v <- 1:3 ffm <- as.ff(m) ffv <- as.ff(v) d <- data.frame(m, v) ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm)) all.equal(d, ffd[,]) ffd physical(ffd) d <- data.frame(m, v) ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_split=1) all.equal(d, ffd[,]) ffd physical(ffd) d <- data.frame(m, v) ffd <- ffdf(ffm, v=ffv, row.names=row.names(ffm), ff_join=list(newff=c(1,2))) all.equal(d, ffd[,]) ffd physical(ffd) d <- data.frame(I(m), I(v)) ffd <- ffdf(m=I(ffm), v=I(ffv), row.names=row.names(ffm)) all.equal(d, ffd[,]) ffd physical(ffd) rm(ffm,ffv,ffd); gc()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.