Empirical Cumulative Distribution Function Plot Based on Type I Censored Data
Produce an empirical cumulative distribution function plot for Type I left-censored or right-censored data.
ecdfPlotCensored(x, censored, censoring.side = "left", discrete = FALSE, prob.method = "michael-schucany", plot.pos.con = 0.375, plot.it = TRUE, add = FALSE, ecdf.col = 1, ecdf.lwd = 3 * par("cex"), ecdf.lty = 1, include.cen = FALSE, cen.pch = ifelse(censoring.side == "left", 6, 2), cen.cex = par("cex"), cen.col = 4, ..., type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL, xlim = NULL, ylim = NULL)
x |
numeric vector of observations. Missing ( |
censored |
numeric or logical vector indicating which values of |
censoring.side |
character string indicating on which side the censoring occurs. The possible values are
|
discrete |
logical scalar indicating whether the assumed parent distribution of |
prob.method |
character string indicating what method to use to compute the plotting positions (empirical probabilities).
Possible values are The |
plot.pos.con |
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is |
plot.it |
logical scalar indicating whether to produce a plot or add to the current plot (see |
add |
logical scalar indicating whether to add the empirical cdf to the current plot ( |
ecdf.col |
a numeric scalar or character string determining the color of the empirical cdf line or points.
The default value is |
ecdf.lwd |
a numeric scalar determining the width of the empirical cdf line. The default value is
|
ecdf.lty |
a numeric scalar determining the line type of the empirical cdf line. The default value is
|
include.cen |
logical scalar indicating whether to include censored values in the plot. The default value is
|
cen.pch |
numeric scalar or character string indicating the plotting character to use to plot censored values.
The default value is |
cen.cex |
numeric scalar that determines the size of the plotting character used to plot censored values.
The default value is the current value of the cex graphics parameter. See the entry for |
cen.col |
numeric scalar or character string that determines the color of the plotting character used to
plot censored values. The default value is |
type, main, xlab, ylab, xlim, ylim, ... |
additional graphical parameters (see |
The function ecdfPlotCensored
does exactly the same thing as
ecdfPlot
, except it calls the function ppointsCensored
to compute the plotting positions (estimated cumulative probabilities) for the
uncensored observations.
If plot.it=TRUE
, the estimated cumulative probabilities for the uncensored
observations are plotted against the uncensored observations. By default, the
function ecdfPlotCensored
plots a step function when discrete=TRUE
,
and plots a straight line between points when discrete=FALSE
. The user may
override these defaults by supplying the graphics parameter
type (type="s"
for a step function, type="l"
for linear interpolation,
type="p"
for points only, etc.).
If include.cen=TRUE
, censored observations are included on the plot as points. The arguments
cen.pch
, cen.cex
, and cen.col
control the appearance of these points.
In cases where x
is a random sample, the emprical cdf will change from sample to sample and
the variability in these estimates can be dramatic for small sample sizes. Caution must be used in
interpreting the empirical cdf when a large percentage of the observations are censored.
ecdfPlotCensored
returns a list with the following components:
Order.Statistics |
numeric vector of the “ordered” observations. |
Cumulative.Probabilities |
numeric vector of the associated plotting positions. |
Censored |
logical vector indicating which of the ordered observations are censored. |
Censoring.Side |
character string indicating whether the data are left- or right-censored.
This is same value as the argument |
Prob.Method |
character string indicating what method was used to compute the plotting positions.
This is the same value as the argument |
Optional Component (only present when prob.method="michael-schucany"
or prob.method="hirsch-stedinger"
):
Plot.Pos.Con |
numeric scalar containing the value of the plotting position constant that was used.
This is the same as the argument |
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data.
Censored observations complicate the procedures used to graphically explore data. Techniques from
survival analysis and life testing have been developed to generalize the procedures for constructing
plotting positions, empirical cdf plots, and q-q plots to data sets with censored observations
(see ppointsCensored
).
Empirical cumulative distribution function (ecdf) plots are often plotted with theoretical cdf plots
(see cdfPlot
and cdfCompareCensored
) to graphically assess whether a
sample of observations comes from a particular distribution. More often, however, quantile-quantile
(Q-Q) plots are used instead (see qqPlot
and qqPlotCensored
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse Kaplan-Meier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997-2004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715-727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457-481.
Lee, E.T., and J.W. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley & Sons, Hoboken, New Jersey, 513pp.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461-496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945-966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
# Generate 20 observations from a normal distribution with mean=20 and sd=5, # censor all observations less than 18, then generate an empirical cdf plot # for the complete data set and the censored data set. Note that the empirical # cdf plot for the censored data set starts at the first ordered uncensored # observation, and that for values of x > 18 the two emprical cdf plots are # exactly the same. This is because there is only one censoring level and # no uncensored observations fall below the censored observations. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(333) x <- rnorm(20, mean=20, sd=5) censored <- x < 18 sum(censored) #[1] 7 new.x <- x new.x[censored] <- 18 dev.new() ecdfPlot(x, xlim = range(pretty(x)), main = "Empirical CDF Plot for\nComplete Data Set") dev.new() ecdfPlotCensored(new.x, censored, xlim = range(pretty(x)), main="Empirical CDF Plot for\nCensored Data Set") # Clean up #--------- rm(x, censored, new.x) #------------------------------------------------------------------------------------ # Example 15-1 of USEPA (2009, page 15-10) gives an example of # computing plotting positions based on censored manganese # concentrations (ppb) in groundwater collected at 5 monitoring # wells. The data for this example are stored in # EPA.09.Ex.15.1.manganese.df. Here we will create an empirical # CDF plot based on the Kaplan-Meier method. EPA.09.Ex.15.1.manganese.df # Sample Well Manganese.Orig.ppb Manganese.ppb Censored #1 1 Well.1 <5 5.0 TRUE #2 2 Well.1 12.1 12.1 FALSE #3 3 Well.1 16.9 16.9 FALSE #4 4 Well.1 21.6 21.6 FALSE #5 5 Well.1 <2 2.0 TRUE #... #21 1 Well.5 17.9 17.9 FALSE #22 2 Well.5 22.7 22.7 FALSE #23 3 Well.5 3.3 3.3 FALSE #24 4 Well.5 8.4 8.4 FALSE #25 5 Well.5 <2 2.0 TRUE dev.new() with(EPA.09.Ex.15.1.manganese.df, ecdfPlotCensored(Manganese.ppb, Censored, prob.method = "kaplan-meier", ecdf.col = "blue", main = "Empirical CDF of Manganese Data\nBased on Kaplan-Meier")) #========== # Clean up #--------- graphics.off()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.