Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

sdf_bind

Bind multiple Spark DataFrames by row and column


Description

sdf_bind_rows() and sdf_bind_cols() are implementation of the common pattern of do.call(rbind, sdfs) or do.call(cbind, sdfs) for binding many Spark DataFrames into one.

Usage

sdf_bind_rows(..., id = NULL)

sdf_bind_cols(...)

Arguments

...

Spark tbls to combine.

Each argument can either be a Spark DataFrame or a list of Spark DataFrames

When row-binding, columns are matched by name, and any missing columns with be filled with NA.

When column-binding, rows are matched by position, so all data frames must have the same number of rows.

id

Data frame identifier.

When id is supplied, a new column of identifiers is created to link each row to its original Spark DataFrame. The labels are taken from the named arguments to sdf_bind_rows(). When a list of Spark DataFrames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

Details

The output of sdf_bind_rows() will contain a column if that column appears in any of the inputs.

Value

sdf_bind_rows() and sdf_bind_cols() return tbl_spark


sparklyr

R Interface to Apache Spark

v1.6.2
Apache License 2.0 | file LICENSE
Authors
Javier Luraschi [aut], Kevin Kuo [aut] (<https://orcid.org/0000-0001-7803-7901>), Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb] (<https://orcid.org/0000-0003-1469-9396>), Wil Davis [ctb], RStudio [cph], The Apache Software Foundation [aut, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.