ff: write.table.ffdf – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

write.table.ffdf

Exporting csv files from ff data.frames

Description

Function write.table.ffdf writes a ffdf object to a separated flat file, very much like (and using) write.table. It can also work with any convenience wrappers like write.csv and provides its own convenience wrapper (e.g. write.csv.ffdf) for R's usual wrappers.

Usage

write.table.ffdf(x = NULL
, file, append = FALSE
, nrows = -1, first.rows = NULL, next.rows = NULL
, FUN = "write.table", ...
, transFUN = NULL
, BATCHBYTES = getOption("ffbatchbytes")
, VERBOSE = FALSE
)
write.csv.ffdf(...)
write.csv2.ffdf(...)
write.csv(...)
write.csv2(...)

Arguments

`x`	a `ffdf` object which to export to the separated file
`file`	either a character string naming a file or a connection open for writing. `""` indicates output to the console.
`append`	logical. Only relevant if `file` is a character string. If `TRUE`, the output is appended to the file. If `FALSE`, any existing file of the name is destroyed.
`nrows`	integer: the maximum number of rows to write in (includes first.rows in case a 'first' chunk is read) Negative and other invalid values are ignored.
`first.rows`	the number of rows to write with the first chunk (default: next.rows)
`next.rows`	integer: number of rows to write in further chunks, see details. By default calculated as `BATCHBYTES %/% sum(.rambytes[vmode(x)])`
`FUN`	character: name of a function that is called for writing each chunk, see `write.table`, `write.csv`, etc.
`...`	further arguments, passed to `FUN` in `write.table.ffdf`, or passed to `write.table.ffdf` in the convenience wrappers
`transFUN`	NULL or a function that is called on each data.frame chunk before writing with `FUN` (for filtering, transformations etc.)
`BATCHBYTES`	integer: bytes allowed for the size of the `data.frame` storing the result of reading one chunk. Default `getOption("ffbatchbytes")`.
`VERBOSE`	logical: TRUE to verbose timings for each processed chunk (default FALSE)

Details

write.table.ffdf has been designed to export very large ffdf objects to separated flatfiles in chunks. The first chunk is potentially written with col.names. Further chunks are appended.
write.table.ffdf has been designed to behave as much like write.table as possible. However, note the following differences:

by default row.names are only written if the ffdf has row.names.

Value

invisible

Note

write.csv and write.csv2 have been fixed in order to suppress col.names if append=TRUE is passed. Note also that write.table.ffdf passes col.names=FALSE for all chunks following the first chunk - but not so for FUN="write.csv" and FUN="write.csv2" .

Author(s)

Jens Oehlschlägel, Christophe Dutang

Examples

x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=1:26, dbl=1:26 + 0.1
, fac=factor(letters), ord=ordered(LETTERS), dct=Sys.time()+1:26
, dat=seq(as.Date("1910/1/1"), length.out=26, by=1), stringsAsFactors = TRUE)
   ffx <- as.ffdf(x)

   csvfile <- tempPathFile(path=getOption("fftempdir"), extension="csv")

   write.csv.ffdf(ffx, file=csvfile)
   write.csv.ffdf(ffx, file=csvfile, append=TRUE)

   ffy <- read.csv.ffdf(file=csvfile, header=TRUE
, colClasses=c(ord="ordered", dct="POSIXct", dat="Date"))

   rm(ffx, ffy); gc()
   unlink(csvfile)

 ## Not run: 
  # Attention, this takes very long
  vmodes <- c(log="boolean", int="byte", dbl="single"
, fac="short", ord="short", dct="single", dat="single")

  message("create a ffdf with 7 columns and 78 mio rows")
  system.time({
    x <- data.frame(log=rep(c(FALSE, TRUE), length.out=26), int=1:26, dbl=1:26 + 0.1
, fac=factor(letters), ord=ordered(LETTERS), dct=Sys.time()+1:26
, dat=seq(as.Date("1910/1/1"), length.out=26, by=1), stringsAsFactors = TRUE)
    x <- do.call("rbind", rep(list(x), 10))
    x <- do.call("rbind", rep(list(x), 10))
    x <- do.call("rbind", rep(list(x), 10))
    x <- do.call("rbind", rep(list(x), 10))
    ffx <- as.ffdf(x, vmode = vmodes)
    for (i in 2:300){
      message(i, "\n")
      last <- nrow(ffx) + nrow(x)
      first <- last - nrow(x) + 1L
      nrow(ffx) <- last
      ffx[first:last,] <- x
    }
  })


  csvfile <- tempPathFile(path=getOption("fftempdir"), extension="csv")

  write.csv.ffdf(ffx, file=csvfile, VERBOSE=TRUE)
  ffy <- read.csv.ffdf(file=csvfile, header=TRUE
, colClasses=c(ord="ordered", dct="POSIXct", dat="Date")
, asffdf_args=list(vmode = vmodes), VERBOSE=TRUE)

  rm(ffx, ffy); gc()
  unlink(csvfile)
 
## End(Not run)

ff

Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

v4.0.4

GPL-2 | GPL-3 | file LICENSE

Authors

Daniel Adler [aut], Christian Gläser [aut], Oleg Nenadic [aut], Jens Oehlschlägel [aut, cre], Martijn Schuemie [aut], Walter Zucchini [aut]

Initial release

2020-10-13