Histograms and frequency polygons.
Display a 1d distribution by dividing into bins and counting the number of observations in each bin. Histograms use bars; frequency polygons use lines.
geom_freqpoly( mapping = NULL, data = NULL, stat = "bin", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) geom_histogram( mapping = NULL, data = NULL, stat = "bin", position = "stack", ..., binwidth = NULL, bins = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_bin( mapping = NULL, data = NULL, geom = "bar", position = "stack", ..., binwidth = NULL, bins = NULL, center = NULL, boundary = NULL, closed = c("right", "left"), pad = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
other arguments passed on to |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
binwidth |
The width of the bins. The default is to use The bin width of a date variable is the number of days in each time; the bin width of a time variable is the number of seconds. |
bins |
Number of bins. Overridden by |
geom, stat |
Use to override the default connection between
|
center |
The center of one of the bins. Note that if center is above or
below the range of the data, things will be shifted by an appropriate
number of |
boundary |
A boundary between two bins. As with |
closed |
One of |
pad |
If |
By default, stat_bin
uses 30 bins - this is not a good default,
but the idea is to get you experimenting with different binwidths. You
may need to look at a few to uncover the full story behind your data.
geom_histogram
uses the same aesthetics as geom_bar
;
geom_freqpoly
uses the same aesthetics as geom_line
.
number of points in bin
density of points in bin, scaled to integrate to 1
count, scaled to maximum of 1
density, scaled to maximum of 1
stat_count
, which counts the number of cases at each x
posotion, without binning. It is suitable for both discrete and continuous
x data, whereas stat_bin is suitable only for continuous x data.
ggplot(diamonds, aes(carat)) + geom_histogram() ggplot(diamonds, aes(carat)) + geom_histogram(binwidth = 0.01) ggplot(diamonds, aes(carat)) + geom_histogram(bins = 200) # Rather than stacking histograms, it's easier to compare frequency # polygons ggplot(diamonds, aes(price, fill = cut)) + geom_histogram(binwidth = 500) ggplot(diamonds, aes(price, colour = cut)) + geom_freqpoly(binwidth = 500) # To make it easier to compare distributions with very different counts, # put density on the y axis instead of the default count ggplot(diamonds, aes(price, ..density.., colour = cut)) + geom_freqpoly(binwidth = 500) if (require("ggplot2movies")) { # Often we don't want the height of the bar to represent the # count of observations, but the sum of some other variable. # For example, the following plot shows the number of movies # in each rating. m <- ggplot(movies, aes(rating)) m + geom_histogram(binwidth = 0.1) # If, however, we want to see the number of votes cast in each # category, we need to weight by the votes variable m + geom_histogram(aes(weight = votes), binwidth = 0.1) + ylab("votes") # For transformed scales, binwidth applies to the transformed data. # The bins have constant width on the transformed scale. m + geom_histogram() + scale_x_log10() m + geom_histogram(binwidth = 0.05) + scale_x_log10() # For transformed coordinate systems, the binwidth applies to the # raw data. The bins have constant width on the original scale. # Using log scales does not work here, because the first # bar is anchored at zero, and so when transformed becomes negative # infinity. This is not a problem when transforming the scales, because # no observations have 0 ratings. m + geom_histogram(origin = 0) + coord_trans(x = "log10") # Use origin = 0, to make sure we don't take sqrt of negative values m + geom_histogram(origin = 0) + coord_trans(x = "sqrt") # You can also transform the y axis. Remember that the base of the bars # has value 0, so log transformations are not appropriate m <- ggplot(movies, aes(x = rating)) m + geom_histogram(binwidth = 0.5) + scale_y_sqrt() } rm(movies)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.