arrow: FileFormat – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

FileFormat

Dataset file formats

Description

A FileFormat holds information about how to read and parse the files included in a Dataset. There are subclasses corresponding to the supported file formats (ParquetFileFormat and IpcFileFormat).

Factory

FileFormat$create() takes the following arguments:

format: A string identifier of the file format. Currently supported values:
- "parquet"
- "ipc"/"arrow"/"feather", all aliases for each other; for Feather, note that only version 2 files are supported
- "csv"/"text", aliases for the same thing (because comma is the default delimiter for text files
- "tsv", equivalent to passing format = "text", delimiter = "\t"
...: Additional format-specific options

'format = "parquet"“:
- dict_columns: Names of columns which should be read as dictionaries.
- Any Parquet options from FragmentScanOptions.
format = "text": see CsvParseOptions. Note that you can specify them either with the Arrow C++ library naming ("delimiter", "quoting", etc.) or the readr-style naming used in read_csv_arrow() ("delim", "quote", etc.). Not all readr options are currently supported; please file an issue if you encounter one that arrow should support. Also, the following options are supported. From CsvReadOptions:
- skip_rows
- column_names
- autogenerate_column_names From CsvFragmentScanOptions (these values can be overridden at scan time):
- convert_options: a CsvConvertOptions
- block_size

It returns the appropriate subclass of FileFormat (e.g. ParquetFileFormat)

arrow

Integration to 'Apache' 'Arrow'

v4.0.0.1

Apache License (>= 2.0)

Authors

Neal Richardson [aut, cre], Ian Cook [aut], Jonathan Keane [aut], Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>), Jeroen Ooms [aut], Javier Luraschi [ctb], Jeffrey Wong [ctb], Apache Arrow [aut, cph]

Initial release

FileFormat

Description

Factory

arrow

We don't support your browser anymore