sparklyr: sdf_expand_grid – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

sdf_expand_grid

Create a Spark dataframe containing all combinations of inputs

Description

Given one or more R vectors/factors or single-column Spark dataframes, perform an expand.grid operation on all of them and store the result in a Spark dataframe

Usage

sdf_expand_grid(
  sc,
  ...,
  broadcast_vars = NULL,
  memory = TRUE,
  repartition = NULL,
  partition_by = NULL
)

Arguments

`sc`	The associated Spark connection.
`...`	Each input variable can be either a R vector/factor or a Spark dataframe. Unnamed inputs will assume the default names of 'Var1', 'Var2', etc in the result, similar to what 'expand.grid' does for unnamed inputs.
`broadcast_vars`	Indicates which input(s) should be broadcasted to all nodes of the Spark cluster during the join process (default: none).
`memory`	Boolean; whether the resulting Spark dataframe should be cached into memory (default: TRUE)
`repartition`	Number of partitions the resulting Spark dataframe should have
`partition_by`	Vector of column names used for partitioning the resulting Spark dataframe, only supported for Spark 2.0+

Examples

## Not run: 
sc <- spark_connect(master = "local")
grid_sdf <- sdf_expand_grid(sc, seq(5), rnorm(10), letters)

## End(Not run)

sparklyr

R Interface to Apache Spark

v1.6.2

Apache License 2.0 | file LICENSE

Authors

Javier Luraschi [aut], Kevin Kuo [aut] (<https://orcid.org/0000-0001-7803-7901>), Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb] (<https://orcid.org/0000-0003-1469-9396>), Wil Davis [ctb], RStudio [cph], The Apache Software Foundation [aut, cph]

Initial release