Frequency Scatterplot
Uses ggplot2
to plot a scatterplot or dot-like chart for the case
where there is a very large number of overlapping values. This works
for continuous and categorical x
and y
. For continuous
variables it serves the same purpose as hexagonal binning. Counts for
overlapping points are grouped into quantile groups and level of
transparency and rainbow colors are used to provide count information.
Instead, you can specify stick=TRUE
not use color but to encode
cell frequencies
with the height of a black line y-centered at the middle of the bins.
Relative frequencies are not transformed, and the maximum cell
frequency is shown in a caption. Every point with at least a
frequency of one is depicted with a full-height light gray vertical
line, scaled to the above overall maximum frequency. In this way to
relative frequency is to proportion of these light gray lines that are
black, and one can see points whose frequencies are too low to see the
black lines.
The result can also be passed to ggplotly
. Actual cell
frequencies are added to the hover text in that case using the
label
ggplot2
aesthetic.
ggfreqScatter(x, y, by=NULL, bins=50, g=10, cuts=NULL, xtrans = function(x) x, ytrans = function(y) y, xbreaks = pretty(x, 10), ybreaks = pretty(y, 10), xminor = NULL, yminor = NULL, xlab = as.character(substitute(x)), ylab = as.character(substitute(y)), fcolors = viridis::viridis(10), nsize=FALSE, stick=FALSE, html=FALSE, prfreq=FALSE, ...)
x |
x-variable |
y |
y-variable |
by |
an optional vector used to make separate plots for each
distinct value using |
bins |
for continuous |
g |
number of quantile groups to make for frequency counts. Use
|
cuts |
instead of using |
xtrans,ytrans |
functions specifying transformations to be made before binning and plotting |
xbreaks,ybreaks |
vectors of values to label on axis, on original scale |
xminor,yminor |
values at which to put minor tick marks, on original scale |
xlab,ylab |
axis labels. If not specified and variable has a
|
fcolors |
|
nsize |
set to |
stick |
set to |
html |
set to |
prfreq |
set to |
... |
arguments to pass to |
a ggplot
object
Frank Harrell
set.seed(1) x <- rnorm(1000) y <- rnorm(1000) count <- sample(1:100, 1000, TRUE) x <- rep(x, count) y <- rep(y, count) # color=alpha=NULL below makes loess smooth over all points g <- ggfreqScatter(x, y) + # might add g=0 if using plotly geom_smooth(aes(color=NULL, alpha=NULL), se=FALSE) + ggtitle("Using Deciles of Frequency Counts, 2500 Bins") g # plotly::ggplotly(g, tooltip='label') # use plotly, hover text = freq. only # Plotly makes it somewhat interactive, with hover text tooltips # Instead use varying-height sticks to depict frequencies ggfreqScatter(x, y, stick=TRUE) + labs(subtitle='Relative height of black lines to gray lines is proportional to cell frequency. Note that points with even tiny frequency are visable (gray line with no visible black line).') # Try with x categorical x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE) ggfreqScatter(x1, y) # Try with y categorical y1 <- sample(LETTERS[1:10], length(x), TRUE) ggfreqScatter(x, y1) # Both categorical, larger point symbols, box instead of circle ggfreqScatter(x1, y1, shape=15, size=7) # Vary box size instead ggfreqScatter(x1, y1, nsize=TRUE, shape=15)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.