Open a multi-file dataset
Arrow Datasets allow you to query against data that has been split across
multiple files. This sharding of data may indicate partitioning, which
can accelerate queries that only touch some partitions (files). Call
open_dataset()
to point to a directory of data files and return a
Dataset
, then use dplyr
methods to query it.
open_dataset( sources, schema = NULL, partitioning = hive_partition(), unify_schemas = NULL, ... )
sources |
One of:
When |
schema |
Schema for the |
partitioning |
When
The default is to autodetect Hive-style partitions. When |
unify_schemas |
logical: should all data fragments (files, |
... |
additional arguments passed to |
A Dataset R6 object. Use dplyr
methods on it to query the data,
or call $NewScan()
to construct a query directly.
vignette("dataset", package = "arrow")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.