Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

sdf_quantile

Compute (Approximate) Quantiles with a Spark DataFrame


Description

Given a numeric column within a Spark DataFrame, compute approximate quantiles.

Usage

sdf_quantile(
  x,
  column,
  probabilities = c(0, 0.25, 0.5, 0.75, 1),
  relative.error = 1e-05,
  weight.column = NULL
)

Arguments

x

A spark_connection, ml_pipeline, or a tbl_spark.

column

The column(s) for which quantiles should be computed. Multiple columns are only supported in Spark 2.0+.

probabilities

A numeric vector of probabilities, for which quantiles should be computed.

relative.error

The maximal possible difference between the actual percentile of a result and its expected percentile (e.g., if 'relative.error' is 0.01 and 'probabilities' is 0.95, then any value between the 94th and 96th percentile will be considered an acceptable approximation).

weight.column

If not NULL, then a generalized version of the Greenwald- Khanna algorithm will be run to compute weighted percentiles, with each sample from 'column' having a relative weight specified by the corresponding value in 'weight.column'. The weights can be considered as relative frequencies of sample data points.


sparklyr

R Interface to Apache Spark

v1.6.2
Apache License 2.0 | file LICENSE
Authors
Javier Luraschi [aut], Kevin Kuo [aut] (<https://orcid.org/0000-0001-7803-7901>), Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb] (<https://orcid.org/0000-0003-1469-9396>), Wil Davis [ctb], RStudio [cph], The Apache Software Foundation [aut, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.