Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ggfreqScatter

Frequency Scatterplot


Description

Uses ggplot2 to plot a scatterplot or dot-like chart for the case where there is a very large number of overlapping values. This works for continuous and categorical x and y. For continuous variables it serves the same purpose as hexagonal binning. Counts for overlapping points are grouped into quantile groups and level of transparency and rainbow colors are used to provide count information.

Instead, you can specify stick=TRUE not use color but to encode cell frequencies with the height of a black line y-centered at the middle of the bins. Relative frequencies are not transformed, and the maximum cell frequency is shown in a caption. Every point with at least a frequency of one is depicted with a full-height light gray vertical line, scaled to the above overall maximum frequency. In this way to relative frequency is to proportion of these light gray lines that are black, and one can see points whose frequencies are too low to see the black lines.

The result can also be passed to ggplotly. Actual cell frequencies are added to the hover text in that case using the label ggplot2 aesthetic.

Usage

ggfreqScatter(x, y, by=NULL, bins=50, g=10, cuts=NULL,
              xtrans = function(x) x,
              ytrans = function(y) y,
              xbreaks = pretty(x, 10),
              ybreaks = pretty(y, 10),
              xminor  = NULL, yminor = NULL,
              xlab = as.character(substitute(x)),
              ylab = as.character(substitute(y)),
              fcolors = viridis::viridis(10), nsize=FALSE,
              stick=FALSE, html=FALSE, prfreq=FALSE, ...)

Arguments

x

x-variable

y

y-variable

by

an optional vector used to make separate plots for each distinct value using facet_wrap()

bins

for continuous x or y is the number of bins to create by rounding. Ignored for categorical variables. If a 2-vector, the first element corresponds to x and the second to y.

g

number of quantile groups to make for frequency counts. Use g=0 to use frequencies continuously for color coding. This is recommended only when using plotly.

cuts

instead of using g, specify cuts to provide the vector of cuts for categorizing frequencies for assignment to colors

xtrans,ytrans

functions specifying transformations to be made before binning and plotting

xbreaks,ybreaks

vectors of values to label on axis, on original scale

xminor,yminor

values at which to put minor tick marks, on original scale

xlab,ylab

axis labels. If not specified and variable has a label, thatu label will be used.

fcolors

colors argument to pass to scale_color_gradientn to color code frequencies. Use fcolors=gray.colors(10, 0.75, 0) to show gray scale, for example. Another good choice is fcolors=hcl.colors(10, 'Blue-Red').

nsize

set to TRUE to not vary color or transparency but instead to size the symbols in relation to the number of points. Best with both x and y are discrete. ggplot2 size is taken as the fourth root of the frequency. If there are 15 or unique frequencies all the unique frequencies are used, otherwise g quantile groups of frequencies are used.

stick

set to TRUE to not use colors but instead use varying-height black vertical lines to depict cell frequencies.

html

set to TRUE to use html in axis labels instead of plotmath

prfreq

set to TRUE to print the frequency distributions of the binned coordinate frequencies

...

arguments to pass to geom_point such as shape and size

Value

a ggplot object

Author(s)

Frank Harrell

See Also

Examples

set.seed(1)
x <- rnorm(1000)
y <- rnorm(1000)
count <- sample(1:100, 1000, TRUE)
x <- rep(x, count)
y <- rep(y, count)
# color=alpha=NULL below makes loess smooth over all points
g <- ggfreqScatter(x, y) +   # might add g=0 if using plotly
      geom_smooth(aes(color=NULL, alpha=NULL), se=FALSE) +
      ggtitle("Using Deciles of Frequency Counts, 2500 Bins")
g
# plotly::ggplotly(g, tooltip='label')  # use plotly, hover text = freq. only
# Plotly makes it somewhat interactive, with hover text tooltips

# Instead use varying-height sticks to depict frequencies
ggfreqScatter(x, y, stick=TRUE) +
 labs(subtitle='Relative height of black lines to gray lines
is proportional to cell frequency.
Note that points with even tiny frequency are visable
(gray line with no visible black line).')


# Try with x categorical
x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE)
ggfreqScatter(x1, y)

# Try with y categorical
y1 <- sample(LETTERS[1:10], length(x), TRUE)
ggfreqScatter(x, y1)

# Both categorical, larger point symbols, box instead of circle
ggfreqScatter(x1, y1, shape=15, size=7)
# Vary box size instead
ggfreqScatter(x1, y1, nsize=TRUE, shape=15)

Hmisc

Harrell Miscellaneous

v4.5-0
GPL (>= 2)
Authors
Frank E Harrell Jr <fh@fharrell.com>, with contributions from Charles Dupont and many others.
Initial release
2021-02-27

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.