Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ffdfdply

Performs a split-apply-combine on an ffdf


Description

Performs a split-apply-combine on an ffdf. Splits the x ffdf according to split and applies FUN to the data, stores the result of the FUN in an ffdf.
Remark that this function does not actually split the data. In order to reduce the number of times data is put into RAM for situations with a lot of split levels, the function extracts groups of split elements which can be put into RAM according to BATCHBYTES. Please make sure your FUN covers the fact that several split elements can be in one chunk of data on which FUN is applied.
Mark also that NA's in the split are not considered as a split on which the FUN will be applied.

Usage

ffdfdply(
  x,
  split,
  FUN,
  BATCHBYTES = getOption("ffbatchbytes"),
  RECORDBYTES = sum(.rambytes[vmode(x)]),
  trace = TRUE,
  ...
)

Arguments

x

an ffdf

split

an ff vector which is part of the ffdf x

FUN

the function to apply to each split. This function needs to return a data.frame

BATCHBYTES

integer scalar limiting the number of bytes to be processed in one chunk

RECORDBYTES

optional integer scalar representing the bytes needed to process one row of x

trace

logical indicating to show on which split the function is computing

...

other parameters passed on to FUN

Value

an ffdf

See Also

Examples

data(iris)
ffiris <- as.ffdf(iris)

youraggregatorFUN <- function(x){
	dup <- duplicated(x[c("Species", "Petal.Width")])
  o <- order(x$Petal.Width)
  lowest_pw <- x[rev(o),][!dup,]
  highest_pw <- x[o,][!dup,]
  lowest_pw$group <- factor("lowest", levels=c("lowest", "highest"))
  highest_pw$group <- factor("highest", levels=c("lowest", "highest"))
	rbind(lowest_pw, highest_pw)
}
result <- ffdfdply( x = ffiris, split = ffiris$Species,
                   FUN = function(x) youraggregatorFUN(x),
                   BATCHBYTES = 5000, trace=TRUE)
dim(result)
dim(iris)
result[1:10,]

ffiris$integerkey <- with(ffiris, as.integer(Sepal.Length))
result <- ffdfdply( x = ffiris, split = as.character(ffiris$integerkey)
                  , FUN = function(x) youraggregatorFUN(x), BATCHBYTES = 5000
                  , trace=TRUE
                  )

ffiris$datekey <- ff( as.Date(ffiris$Sepal.Length[], origin = "1970-01-01"),
                      vmode = "integer")
result <- ffdfdply( x = ffiris, split = as.character(ffiris$datekey) 
                  , FUN = function(x) youraggregatorFUN(x)
                  , BATCHBYTES = 5000, trace=TRUE
                  )

ffbase

Basic Statistical Functions for Package 'ff'

v0.13.3
GPL-3
Authors
Edwin de Jonge, Jan Wijffels, Jan van der Laan
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.