arrow: write_parquet – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

arrow

write_parquet

Write Parquet file to disk

Description

Parquet is a columnar storage file format. This function enables you to write Parquet files from R.

Usage

write_parquet(
  x,
  sink,
  chunk_size = NULL,
  version = NULL,
  compression = default_parquet_compression(),
  compression_level = NULL,
  use_dictionary = NULL,
  write_statistics = NULL,
  data_page_size = NULL,
  use_deprecated_int96_timestamps = FALSE,
  coerce_timestamps = NULL,
  allow_truncated_timestamps = FALSE,
  properties = NULL,
  arrow_properties = NULL
)

Arguments

`x`	`data.frame`, RecordBatch, or Table
`sink`	A string file path, URI, or OutputStream, or path in a file system (`SubTreeFileSystem`)
`chunk_size`	chunk size in number of rows. If NULL, the total number of rows is used.
`version`	parquet version, "1.0" or "2.0". Default "1.0". Numeric values are coerced to character.
`compression`	compression algorithm. Default "snappy". See details.
`compression_level`	compression level. Meaning depends on compression algorithm
`use_dictionary`	Specify if we should use dictionary encoding. Default `TRUE`
`write_statistics`	Specify if we should write statistics. Default `TRUE`
`data_page_size`	Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Default 1 MiB.
`use_deprecated_int96_timestamps`	Write timestamps to INT96 Parquet format. Default `FALSE`.
`coerce_timestamps`	Cast timestamps a particular resolution. Can be `NULL`, "ms" or "us". Default `NULL` (no casting)
`allow_truncated_timestamps`	Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to "ms", do not raise an exception
`properties`	A `ParquetWriterProperties` object, used instead of the options enumerated in this function's signature. Providing `properties` as an argument is deprecated; if you need to assemble `ParquetWriterProperties` outside of `write_parquet()`, use `ParquetFileWriter` instead.
`arrow_properties`	A `ParquetArrowWriterProperties` object. Like `properties`, this argument is deprecated.

Details

Due to features of the format, Parquet files cannot be appended to. If you want to use the Parquet format but also want the ability to extend your dataset, you can write to additional Parquet files and then treat the whole directory of files as a Dataset you can query. See vignette("dataset", package = "arrow") for examples of this.

The parameters compression, compression_level, use_dictionary and write_statistics support various patterns:

The default NULL leaves the parameter unspecified, and the C++ library uses an appropriate default for each column (defaults listed above)
A single, unnamed, value (e.g. a single string for compression) applies to all columns
An unnamed vector, of the same size as the number of columns, to specify a value for each column, in positional order
A named vector, to specify the value for the named columns, the default value for the setting is used when not supplied

The compression argument can be any of the following (case insensitive): "uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2". Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip" are almost always included. See codec_is_available(). The default "snappy" is used if available, otherwise "uncompressed". To disable compression, set compression = "uncompressed". Note that "uncompressed" columns may still have dictionary encoding.

Value

the input x invisibly.

Examples

## Not run: 
tf1 <- tempfile(fileext = ".parquet")
write_parquet(data.frame(x = 1:5), tf1)

# using compression
if (codec_is_available("gzip")) {
  tf2 <- tempfile(fileext = ".gz.parquet")
  write_parquet(data.frame(x = 1:5), tf2, compression = "gzip", compression_level = 5)
}

## End(Not run)

arrow

Integration to 'Apache' 'Arrow'

v4.0.0.1

Apache License (>= 2.0)

Authors

Neal Richardson [aut, cre], Ian Cook [aut], Jonathan Keane [aut], Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>), Jeroen Ooms [aut], Javier Luraschi [ctb], Jeffrey Wong [ctb], Apache Arrow [aut, cph]

Initial release

write_parquet

Description

Usage

Arguments

Details

Value

Examples

arrow

We don't support your browser anymore