sparklyr: sdf_persist – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

sparklyr

sdf_persist

Persist a Spark DataFrame

Description

Persist a Spark DataFrame, forcing any pending computations and (optionally) serializing the results to disk.

Usage

sdf_persist(x, storage.level = "MEMORY_AND_DISK", name = NULL)

Arguments

`x`	A `spark_connection`, `ml_pipeline`, or a `tbl_spark`.
`storage.level`	The storage level to be used. Please view the Spark Documentation for information on what storage levels are accepted.
`name`	A name to assign this table. Passed to [sdf_register()].

Details

Spark DataFrames invoke their operations lazily – pending operations are deferred until their results are actually needed. Persisting a Spark DataFrame effectively 'forces' any pending computations, and then persists the generated Spark DataFrame as requested (to memory, to disk, or otherwise).

Users of Spark should be careful to persist the results of any computations which are non-deterministic – otherwise, one might see that the values within a column seem to 'change' as new operations are performed on that data set.

sparklyr

R Interface to Apache Spark

v1.6.2

Apache License 2.0 | file LICENSE

Authors

Javier Luraschi [aut], Kevin Kuo [aut] (<https://orcid.org/0000-0001-7803-7901>), Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb] (<https://orcid.org/0000-0003-1469-9396>), Wil Davis [ctb], RStudio [cph], The Apache Software Foundation [aut, cph]

Initial release

sdf_persist

Description

Usage

Arguments

Details

sparklyr

We don't support your browser anymore