Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

map_batches

Apply a function to a stream of RecordBatches


Description

As an alternative to calling collect() on a Dataset query, you can use this function to access the stream of RecordBatches in the Dataset. This lets you aggregate on each chunk and pull the intermediate results into a data.frame for further aggregation, even if you couldn't fit the whole Dataset result in memory.

Usage

map_batches(X, FUN, ..., .data.frame = TRUE)

Arguments

X

A Dataset or arrow_dplyr_query object, as returned by the dplyr methods on Dataset.

FUN

A function or purrr-style lambda expression to apply to each batch

...

Additional arguments passed to FUN

.data.frame

logical: collect the resulting chunks into a single data.frame? Default TRUE

Details

This is experimental and not recommended for production use.


arrow

Integration to 'Apache' 'Arrow'

v4.0.0.1
Apache License (>= 2.0)
Authors
Neal Richardson [aut, cre], Ian Cook [aut], Jonathan Keane [aut], Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>), Jeroen Ooms [aut], Javier Luraschi [ctb], Jeffrey Wong [ctb], Apache Arrow [aut, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.