Plot Two Cumulative Distribution Functions
For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.
cdfCompare(x, y = NULL, discrete = FALSE, prob.method = ifelse(discrete, "emp.probs", "plot.pos"), plot.pos.con = NULL, distribution = "norm", param.list = NULL, estimate.params = is.null(param.list), est.arg.list = NULL, x.col = "blue", y.or.fitted.col = "black", x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"), x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ..., type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL)
x |
numeric vector of observations. Missing ( |
y |
a numeric vector (not necessarily of the same length as |
discrete |
logical scalar indicating whether the assumed parent distribution of |
prob.method |
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are
|
plot.pos.con |
numeric scalar between 0 and 1 containing the value of the plotting position constant.
When |
distribution |
when |
param.list |
when |
estimate.params |
when |
est.arg.list |
when |
x.col |
a numeric scalar or character string determining the color of the empirical cdf
(based on |
y.or.fitted.col |
a numeric scalar or character string determining the color of the empirical cdf
(based on |
x.lwd |
a numeric scalar determining the width of the empirical cdf (based on |
y.or.fitted.lwd |
a numeric scalar determining the width of the empirical cdf (based on |
x.lty |
a numeric scalar determining the line type of the empirical cdf
(based on |
y.or.fitted.lty |
a numeric scalar determining the line type of the empirical cdf
(based on |
digits |
when |
type, main, xlab, ylab, xlim, ylim, ... |
additional graphical parameters (see |
When both x
and y
are supplied, the function cdfCompare
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlot
.
When y
is supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list |
a list with components |
y.ecdf.list |
a list with components |
When y
is not supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list |
a list with components |
fitted.cdf.list |
a list with components |
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.
Chambers et al. (1983, pp.11-16) plot the observed order statistics on the y-axis vs. the ecdf on the x-axis and call this a quantile plot.
Empirical cumulative distribution function (ecdf) plots are often plotted with
theoretical cdf plots (see cdfPlot
and cdfCompare
) to
graphically assess whether a sample of observations comes from a particular
distribution. The Kolmogorov-Smirnov goodness-of-fit test
(see gofTest
) is the statistical companion of this kind of
comparison; it is based on the maximum vertical distance between the empirical
cdf plot and the theoretical cdf plot. More often, however,
quantile-quantile (Q-Q) plots are used instead of ecdf plots to graphically assess
departures from an assumed distribution (see qqPlot
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
# Generate 20 observations from a normal (Gaussian) distribution # with mean=10 and sd=2 and compare the empirical cdf with a # theoretical normal cdf that is based on estimating the parameters. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(250) x <- rnorm(20, mean = 10, sd = 2) dev.new() cdfCompare(x) #---------- # Generate 30 observations from an exponential distribution with parameter # rate=0.1 (see the R help file for Exponential) and compare the empirical # cdf with the empirical cdf of the normal observations generated in the # previous example: set.seed(432) y <- rexp(30, rate = 0.1) dev.new() cdfCompare(x, y) #========== # Generate 20 observations from a Poisson distribution with parameter lambda=10 # (see the R help file for Poisson) and compare the empirical cdf with a # theoretical Poisson cdf based on estimating the distribution parameters. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(250) x <- rpois(20, lambda = 10) dev.new() cdfCompare(x, dist = "pois") #========== # Clean up #--------- rm(x, y) graphics.off()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.