Curvewise point and interval summaries for tidy data frames of draws from distributions
Translates draws from distributions in a grouped data frame into a set of point and interval summaries using a curve boxplot-inspired approach.
curve_interval( .data, ..., .along = NULL, .width = 0.5, .interval = c("mhd", "mbd", "bd", "bd-mbd"), .simple_names = TRUE, na.rm = FALSE, .exclude = c(".chain", ".iteration", ".draw", ".row") )
.data |
Data frame (or grouped data frame as returned by |
... |
Bare column names or expressions that, when evaluated in the context of
|
.along |
Which columns are the input values to the function describing the curve (e.g., the "x"
values). Supports tidyselect syntax, as in |
.width |
vector of probabilities to use that determine the widths of the resulting intervals.
If multiple probabilities are provided, multiple rows per group are generated, each with
a different probability interval (and value of the corresponding |
.interval |
The method used to calculate the intervals. Currently, all methods rank the curves
using some measure of data depth, then create envelopes containing the
|
.simple_names |
When |
na.rm |
logical value indicating whether |
.exclude |
A character vector of names of columns to be excluded from summarization if no column names are specified to be summarized. Default ignores several meta-data column names used in tidybayes. |
Intervals are calculated by ranking the curves using some measure of data depth, then creating
envelopes containing the .width
% "deepest" curves (for each value of .width
). Thus, the intervals
are guaranteed to contain at least .width
% of the full curves, but may be conservative (i.e.
they may contain more than .width
% of the curves). See Mirzargar et al. (2014) or
Juul et al. (2020) for an accessible introduction to the idea.
A data frame containing point summaries and intervals, with at least one column corresponding
to the point summary, one to the lower end of the interval, one to the upper end of the interval, the
width of the interval (.width
), the type of point summary (.point
), and the type of interval (.interval
).
Matthew Kay
Fraiman, Ricardo and Graciela Muniz. (2001). "Trimmed means for functional data". Test 10: 419–440. doi: 10.1007/BF02595706.
Sun, Ying and Marc G. Genton. (2011). "Functional Boxplots". Journal of Computational and Graphical Statistics, 20(2): 316-334. doi: 10.1198/jcgs.2011.09224
Mirzargar, Mahsa, Ross T Whitaker, and Robert M Kirby. (2014). "Curve Boxplot: Generalization of Boxplot for Ensembles of Curves". IEEE Transactions on Visualization and Computer Graphics. 20(12): 2654-2663. doi: 10.1109/TVCG.2014.2346455
Juul Jonas, Kaare Græsbøll, Lasse Engbo Christiansen, and Sune Lehmann. (2020). "Fixed-time descriptive statistics underestimate extremes of epidemic curve ensembles". arXiv e-print. arXiv:2007.05035
point_interval()
for pointwise intervals. See vignette("lineribbon")
for more examples
and discussion of the differences between pointwise and curvewise intervals.
library(dplyr) library(tidyr) library(ggplot2) # generate a set of curves k = 11 # number of curves n = 201 df = tibble( .draw = 1:k, mean = seq(-5,5, length.out = k), x = list(seq(-15,15,length.out = n)) ) %>% unnest(x) %>% mutate(y = dnorm(x, mean, 3)) # see pointwise intervals... df %>% group_by(x) %>% median_qi(y, .width = c(.5)) %>% ggplot(aes(x = x, y = y)) + geom_lineribbon(aes(ymin = .lower, ymax = .upper)) + geom_line(aes(group = .draw), alpha=0.15, data = df) + scale_fill_brewer() + ggtitle("50% pointwise intervals with point_interval()") + theme_ggdist() # ... compare them to curvewise intervals df %>% group_by(x) %>% curve_interval(y, .width = c(.5)) %>% ggplot(aes(x = x, y = y)) + geom_lineribbon(aes(ymin = .lower, ymax = .upper)) + geom_line(aes(group = .draw), alpha=0.15, data = df) + scale_fill_brewer() + ggtitle("50% curvewise intervals with curve_interval()") + theme_ggdist()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.