Plotting Positions for Type I Censored Data
Returns a list of “ordered” observations and associated plotting positions based on Type I left-censored or right-censored data. These plotting positions may be used to construct empirical cumulative distribution plots or quantile-quantile plots, or to estimate distribution parameters.
ppointsCensored(x, censored, censoring.side = "left", prob.method = "michael-schucany", plot.pos.con = 0.375)
x |
numeric vector of observations. Missing ( |
censored |
numeric or logical vector indicating which values of |
censoring.side |
character string indicating on which side the censoring occurs. The possible values are
|
prob.method |
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are: The default value is The |
plot.pos.con |
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is |
Methods for computing plotting positions for complete data sets
(no censored observations) are discussed in D'Agostino, R.B. (1986a) and
Cleveland (1993). For data sets with censored observations, these methods
must be modified. The function ppointsCensored
allows you to compute
plotting positions based on any of the following methods:
Product-limit method of Kaplan and Meier (1958) (prob.method="kaplan-meier"
).
Hazard plotting method of Nelson (1972) (prob.method="nelson"
).
Generalization of the product-limit method due to Michael and Schucany (1986)
(prob.method="michael-schucany"
) (the default).
Generalization of the product-limit method due to Hirsch and Stedinger (1987)
(prob.method="hirsch-stedinger"
).
Let \underline{x} denote a random sample of N observations from some distribution. Assume n (0 < n < N) of these observations are known and c (c=N-n) of these observations are all censored below (left-censored) or all censored above (right-censored) at k fixed censoring levels
T_1, T_2, …, T_K; \; K ≥ 1 \;\;\;\;\;\; (1)
For the case when K ≥ 2, the data are said to be Type I multiply censored. For the case when K=1, set T = T_1. If the data are left-censored and all n known observations are greater than or equal to T, or if the data are right-censored and all n known observations are less than or equal to T, then the data are said to be Type I singly censored (Nelson, 1982, p.7), otherwise they are considered to be Type I multiply censored.
Let c_j denote the number of observations censored below or above censoring level T_j for j = 1, 2, …, K, so that
∑_{i=1}^K c_j = c \;\;\;\;\;\; (2)
Let x_{(1)}, x_{(2)}, …, x_{(N)} denote the “ordered” observations, where now “observation” means either the actual observation (for uncensored observations) or the censoring level (for censored observations). For right-censored data, if a censored observation has the same value as an uncensored one, the uncensored observation should be placed first. For left-censored data, if a censored observation has the same value as an uncensored one, the censored observation should be placed first.
Note that in this case the quantity x_{(i)} does not necessarily represent the i'th “largest” observation from the (unknown) complete sample.
Finally, let Ω (omega) denote the set of n subscripts in the
“ordered” sample that correspond to uncensored observations.
Product-Limit Method of Kaplan and Meier (prob.method="kaplan-meier"
)
For complete data sets (no censored observations), the
empirical probabilities estimator of the cumulative distribution
function evaluated at the i'th ordered observation is given by
(D'Agostino, 1986a, p.8):
\hat{F}[x_{(i)}] = \hat{p}_i = \frac{\#[x_j ≤ x_{(i)}]}{n} \;\;\;\;\;\; (3)
where \#[x_j ≤ x_{(i)}] denotes the number of observations less than
or equal to x_{(i)} (see the help file for ecdfPlot
).
Kaplan and Meier (1958) extended this method of computing the empirical cdf to
the case of right-censored data.
Right-Censored Data (censoring.side="right"
)
Let S(t) denote the survival function evaluated at t, that is:
S(t) = 1 - F(t) = Pr(X > t) \;\;\;\;\;\; (4)
Kaplan and Meier (1958) show that a nonparametric estimate of the survival function at the i'th ordered observation that is not censored (i.e., i \in Ω), is given by:
\hat{S}[x_{(i)}] | = | \widehat{Pr}[X > x_{(i)}] |
= | \widehat{Pr}[X > x_{(1)}] | |
\;\; \widehat{Pr}[X > x_{(2)} | X > x_{(1)}] \;\; \cdots | ||
\;\; \widehat{Pr}[X > x_{(i)} | X > x_{(i-1)}] | ||
= | ∏_{j \in Ω, j ≤ i} \frac{n_j - d_j}{n_j}, \;\; i \in Ω \;\;\;\;\; (5) |
where n_j is the number of observations (uncensored or censored) with values greater than or equal to x_{(j)}, and d_j denotes the number of uncensored observations exactly equal to x_{(j)} (if there are no tied uncensored observations then d_j will equal 1 for all values of j). (See also Lee and Wang, 2003, pp. 64–69; Michael and Schucany, 1986). By convention, the estimate of the survival function at a censored observation is set equal to the estimated value of the survival function at the largest uncensored observation less than or equal to that censoring level. If there are no uncensored observations less than or equal to a particular censoring level, the estimate of the survival function is set to 1 for that censoring level.
Thus the Kaplan-Meier plotting position at the i'th ordered observation that is not censored (i.e., i \in Ω), is given by:
\hat{p}_i = \hat{F}[x_{(i)}] = 1 - ∏_{j \in Ω, j ≤ i} \frac{n_j - d_j}{n_j} \;\;\;\;\;\; (6)
The plotting position for a censored observation is set equal to the plotting position associated with the largest uncensored observation less than or equal to that censoring level. If there are no uncensored observations less than or equal to a particular censoring level, the plotting position is set to 0 for that censoring level.
As an example, consider the following right-censored data set:
3, ≥4, ≥4, 5, 5, 6
The table below shows how the plotting positions are computed.
i | x_{(i)} | n_i | d_i | \frac{n_i-d_i}{n_i} | Plotting Position |
1 | 3 | 6 | 1 | 5/6 | 1 - (5/6) = 0.167 |
2 | ≥4 | ||||
3 | ≥4 | ||||
4 | 5 | 3 | 2 | 1/3 | 1 - (5/6)(1/3) = 0.722 |
5 | 5 | 0.722 | |||
6 | 6 | 1 | 1 | 0/1 | 1 - (5/6)(1/3)(0/1) = 1 |
Note that for complete data sets, Equation (6) reduces to Equation (3).
Left-Censored Data (censoring.side="left"
)
Gillespie et al. (2010) give formulas for the Kaplan-Meier estimator for the case of
left-cesoring (censoring.side="left"
). In this case, the plotting position
for the i'th ordered observation, assuming it is not censored, is computed as:
\hat{p}_i = \hat{F}[x_{(i)}] = ∏_{j \in Ω, j > i} \frac{n_j - d_j}{n_j} \;\;\;\;\;\; (7)
where n_j is the number of observations (uncensored or censored) with values less than or equal to x_{(j)}, and d_j denotes the number of uncensored observations exactly equal to x_{(j)} (if there are no tied uncensored observations then d_j will equal 1 for all values of j). The plotting position is equal to 1 for the largest uncensored order statistic.
As an example, consider the following left-censored data set:
3, <4, <4, 5, 5, 6
The table below shows how the plotting positions are computed.
i | x_{(i)} | n_i | d_i | \frac{n_i-d_i}{n_i} | Plotting Position |
1 | 3 | 1 | 1 | 0/1 | 1(5/6)(3/5) = 0.5 |
2 | <4 | ||||
3 | <4 | ||||
4 | 5 | 5 | 2 | 3/5 | 0.833 |
5 | 5 | 1(5/6) = 0.833 | |||
6 | 6 | 6 | 1 | 5/6 | 1 |
Note that for complete data sets, Equation (7) reduces to Equation (3).
Modified Kaplan-Meier Method (prob.method="modified kaplan-meier"
)
(Left-Censored Data Only.) For left-censored data, the modified Kaplan-Meier
method is the same as the Kaplan-Meier method, except that for the largest
uncensored order statistic, the plotting position is not set to 1 but rather is
set equal to the Blom plotting position: (N - 0.375)/(N + 0.25). This method
is useful, for example, when creating Quantile-Quantile plots.
Hazard Plotting Method of Nelson (prob.method="nelson"
)
(Right-Censored Data Only.) For right-censored data, Equation (5) can be
re-written as:
\hat{S}[x_{(i)}] = ∏_{j \in Ω, j ≤ i} \frac{N-j}{N-j+1}, \;\; i \in Ω \;\;\;\;\;\; (8)
Nelson (1972) proposed the following formula for plotting positions for the uncensored observations in the context of estimating the hazard function (see Michael and Schucany,1986, p.469):
\hat{p}_i = \hat{F}[x_{(i)}] = 1 - ∏_{j \in Ω, j ≤ i} exp(\frac{-1}{N-j+1}) \;\;\;\;\;\; (9)
See Lee and Wang (2003) for more information about the hazard function.
As for the Kaplan and Meier (1958) method, the plotting position for a censored
observation is set equal to the plotting position associated with the largest
uncensored observation less than or equal to that censoring level. If there are
no uncensored observations less than or equal to a particular censoring level,
the plotting position is set to 0 for that censoring level.
Generalization of Product-Limit Method, Michael and Schucany
(prob.method="michael-schucany"
)
For complete data sets, the disadvantage of using Equation (3) above to define
plotting positions is that it implies the largest observed value is the maximum
possible value of the distribution (the 100'th percentile). This may be
satisfactory if the underlying distribution is known to be discrete, but it is
usually not satisfactory if the underlying distribution is known to be continuous.
A more frequently used formula for plotting positions for complete data sets is given by:
\hat{F}[x_{(i)}] = \hat{p}_i = \frac{i - a}{N - 2a + 1} \;\;\;\;\;\; (10)
where 0 ≤ a ≤ 1 (Cleveland, 1993, p. 18; D'Agostino, 1986a, pp. 8,25).
The value of a is usually chosen so that the plotting positions are
approximately unbiased (i.e., approximate the mean of their distribution) or else
approximate the median value of their distribution (see the help file for
ecdfPlot
). Michael and Schucany (1986) extended this method for
both left- and right-censored data sets.
Right-Censored Data (censoring.side="right"
)
For right-censored data sets, the plotting positions for the uncensored
observations are computed as:
\hat{p}_i = 1 - \frac{N-a+1}{N-2a+1} ∏_{j \in Ω, j ≤ i} \frac{N-j-a+1}{N-j-a+2} \;\; i \in Ω \;\;\;\;\;\; (11)
Note that the plotting positions proposed by Herd (1960) and Johnson (1964) are a special case of Equation (11) with a=0. Equation (11) reduces to Equation (10) in the case of complete data sets. Note that unlike the Kaplan-Meier method, plotting positions associated with tied uncensored observations are not the same (just as in the case for complete data using Equation (10)).
As for the Kaplan and Meier (1958) method, for right-censored data the plotting
position for a censored observation is set equal to the plotting position associated
with the largest uncensored observation less than or equal to that censoring level.
If there are no uncensored observations less than or equal to a particular censoring
level, the plotting position is set to 0 for that censoring level.
Left-Censored Data (censoring.side="left"
)
For left-censored data sets the plotting positions are computed as:
\hat{p}_i = \frac{N-a+1}{N-2a+1} ∏_{j \in Ω, j ≥ i} \frac{j-a}{j-a+1} \;\; i \in Ω \;\;\;\;\;\; (12)
Equation (12) reduces to Equation (10) in the case of complete data sets. Note that unlike the Kaplan-Meier method, plotting positions associated with tied uncensored observations are not the same (just as in the case for complete data using Equation (10)).
For left-censored data, the plotting position for a censored observation is set
equal to the plotting position associated with the smallest uncensored observation
greater than or equal to that censoring level. If there are no uncensored
observations greater than or equal to a particular censoring level, the plotting
position is set to 1 for that censoring level.
Generalization of Product-Limit Method, Hirsch and Stedinger
(prob.method="hirsch-stedinger"
)
Hirsch and Stedinger (1987) use a slightly different approach than Kaplan and Meier
(1958) and Michael and Schucany (1986) to derive a nonparametric estimate of the
survival function (probability of exceedance) in the context of left-censored data.
First they estimate the value of the survival function at each of the censoring
levels. The value of the survival function for an uncensored observation between
two adjacent censoring levels is then computed by linear interpolation (in the form
of a plotting position). See also Helsel and Cohn (1988).
The discussion below presents an extension of the method of Hirsch and Stedinger (1987) to the case of right-censored data, and then presents the original derivation due to Hirsch and Stedinger (1987) for left-censored data.
Right-Censored Data (censoring.side="right"
)
For right-censored data, the survival function is estimated as follows.
For the j'th censoring level (j = 0, 1, …, K), write the
value of the survival function as:
S(T_j) | = | Pr[X > T_j] |
= | Pr[X > T_{j+1}] + Pr[T_j < X ≤ T_{j+1}] | |
= | S(T_{j+1}) + Pr[T_j < X ≤ T_{j+1} | X > T_j] Pr[X > T_j] | |
= | S(T_{j+1}) + Pr[T_j < X ≤ T_{j+1} | X > T_j] S(T_j) \;\;\;\;\;\; (13) |
where
T_0 = -∞, \;\;\;\;\;\; (14)
T_{K+1} = ∞ \;\;\;\;\;\; (15)
Now set
A_j | = | # uncensored observations in (T_j, T_{j+1}] \;\;\;\;\;\; (16) |
B_j | = | # observations in (T_{j+1}, ∞) \;\;\;\;\;\; (17) |
for j = 0, 1, …, K. Then the method of moments estimator of the conditional probability in Equation (13) is given by:
\widehat{Pr}[T_j < X ≤ T_{j+1} | X > T_j] = \frac{A_j}{A_j + B_j} \;\;\;\;\;\; (18)
Hence, by equations (13) and (18) we have
\hat{S}(T_j) = \hat{S}(T_{j+1}) + (\frac{A_j}{A_j + B_j}) \hat{S}(T_{j}) \;\;\;\;\;\; (19)
which can be rewritten as:
\hat{S}(T_{j+1}) = \hat{S}(T_j) [1 - (\frac{A_j}{A_j + B_j})] \;\;\;\;\;\; (20)
Equation (20) can be solved interatively for j = 1, 2, …, K. Note that
\hat{S}(T_0) = \hat{S}(-∞) = S(-∞) = 1 \;\;\;\;\;\; (21)
\hat{S}(T_{K+1}) = \hat{S}(∞) = S(∞) = 0 \;\;\;\;\;\; (22)
Once the values of the survival function at the censoring levels are computed, the plotting positions for the A_j uncensored observations in the interval (T_J, T_{j+1}] (j = 0, 1, …, K) are computed as
\hat{p}_i = [1 - \hat{S}(T_j)] + [\hat{S}(T_j) - \hat{S}(T_{j+1})] \frac{r-a}{A_j - 2a + 1} \;\;\;\;\;\; (23)
where a denotes the plotting position constant, 0 ≤ a ≤ 1, and r denotes the rank of the i'th observation among the A_j uncensored observations in the interval (T_J, T_{j+1}]. (Tied observations are given distinct ranks.)
For the c_j observations censored at censoring level T_j (j = 1, 2, …, K), the plotting positions are computed as:
\hat{p}_i = 1 - [\hat{S}(T_j) \frac{r-a}{c_j - 2a + 1}] \;\;\;\;\;\; (24)
where r denotes the rank of the i'th observation among the c_j
observations censored at censoring level T_j. Note that all the
observations censored at the same censoring level are given distinct ranks,
even though there is no way to distinguish between them.
Left-Censored Data (censoring.side="left"
)
For left-censored data, Hirsch and Stedinger (1987) modify the definition of the
survival function as follows:
S^*(t) = Pr[X ≥ t] \;\;\;\;\;\; (25)
For continuous distributions, the functions in Equations (4) and (25) are identical.
Hirsch and Stedinger (1987) show that for the j'th censoring level (j = 0, 1, …, K), the value of the survival function can be written as:
S(T_j) | = | Pr[X ≥ T_j] |
= | Pr[X ≥ T_{j+1}] + Pr[T_j ≤ X < T_{j+1}] | |
= | S^*(T_{j+1}) + Pr[T_j ≤ X < T_{j+1} | X < T_{j+1}] Pr[X < T_{j+1}] | |
= | S^*(T_{j+1}) + Pr[T_j ≤ X < T_{j+1} | X < T_j] [1 - S^*(T_j)] \;\;\;\;\;\; (26) |
where T_0 and T_{K+1} are defined in Equations (14) and (15).
Now set
A_j | = | # uncensored observations in [T_j, T_{j+1}) \;\;\;\;\;\; (27) |
B_j | = | # observations in (-∞, T_j) \;\;\;\;\;\; (28) |
for j = 0, 1, …, K. Then the method of moments estimator of the conditional probability in Equation (26) is given by:
Pr[T_j ≤ X < T_{j+1} | X < T_{j+1}] = \frac{A_j}{A_j + B_j} \;\;\;\;\;\; (29)
Hence, by Equations (26) and (29) we have
\hat{S}(T_j) = \hat{S}(T_{j+1}) + (\frac{A_j}{A_j + B_j}) \hat{S}(T_{j}) \;\;\;\;\;\; (30)
which can be solved interatively for j = 1, 2, …, K. Note that
\widehat{S^*}(T_{K+1}) = \widehat{S^*}(∞) = S^*(∞) = 0 \;\;\;\;\;\; (31)
\widehat{S^*}(T_0) = \widehat{S^*}(-∞) = S^*(-∞) = 1 \;\;\;\;\;\; (32)
Once the values of the survival function at the censoring levels are computed, the plotting positions for the A_j uncensored observations in the interval [T_J, T_{j+1}) (j = 0, 1, …, K) are computed as
\hat{p}_i = [1 - \widehat{S^*}(T_j)] + [\widehat{S^*}(T_j) - \widehat{S^*}(T_{j+1})] \frac{r-a}{A_j - 2a + 1} \;\;\;\;\;\; (33)
where a denotes the plotting position constant, 0 ≤ a ≤ 0.5, and r denotes the rank of the i'th observation among the A_j uncensored observations in the interval [T_J, T_{j+1}). (Tied observations are given distinct ranks.)
For the c_j observations censored at censoring level T_j (j = 1, 2, …, K), the plotting positions are computed as:
\hat{p}_i = [1 - \widehat{S^*}(T_j)] \frac{r-a}{c_j - 2a + 1} \;\;\;\;\;\; (34)
where r denotes the rank of the i'th observation among the c_j observations censored at censoring level T_j. Note that all the observations censored at the same censoring level are given distinct ranks, even though there is no way to distinguish between them.
ppointsCensored
returns a list with the following components:
Order.Statistics |
numeric vector of the “ordered” observations. |
Cumulative.Probabilities |
numeric vector of the associated plotting positions. |
Censored |
logical vector indicating which of the ordered observations are censored. |
Censoring.Side |
character string indicating whether the data are left- or right-censored.
This is same value as the argument |
Prob.Method |
character string indicating what method was used to compute the plotting positions.
This is the same value as the argument |
Optional Component (only present when prob.method="michael-schucany"
or prob.method="hirsch-stedinger"
):
Plot.Pos.Con |
numeric scalar containing the value of the plotting position constant that was used.
This is the same as the argument |
For censored data sets, plotting positions may be used to construct empirical cumulative distribution
plots (see ecdfPlotCensored
), construct quantile-quantile plots
(see qqPlotCensored
), or to estimate distribution parameters
(see FcnsByCatCensoredData
).
The function survfit
in the built-in R library
survival computes the survival function for right-censored, left-censored, or
interval-censored data. Calling survfit
with
type="kaplan-meier"
will produce similar results to calling
ppointsCensored
with prob.method="kaplan-meier"
. Also, calling
survfit
with type="fh2"
will produce similar results
to calling ppointsCensored
with prob.method="nelson"
.
Helsel and Cohn (1988, p.2001) found very little effect of changing the value of the plotting position constant when using the method of Hirsch and Stedinger (1987) to compute plotting positions for multiply left-censored data. In general, there will be very little difference between plotting positions computed by the different methods except in the case of very small samples and a large amount of censoring.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse Kaplan-Meier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley \& Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 1997-2004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715-727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457-481.
Lee, E.T., and J. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley and Sons, New York.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461-496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945-966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
# Generate 20 observations from a normal distribution with mean=20 and sd=5, # censor all observations less than 18, then compute plotting positions for # this data set. Compare the plotting positions to the plotting positions # for the uncensored data set. Note that the plotting positions for the # censored data set start at the first ordered uncensored observation and # that for values of x > 18 the plotting positions for the two data sets are # exactly the same. This is because there is only one censoring level and # no uncensored observations fall below the censored observations. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(333) x <- rnorm(20, mean=20, sd=5) censored <- x < 18 censored # [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE #[13] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE sum(censored) #[1] 7 new.x <- x new.x[censored] <- 18 round(sort(new.x),1) # [1] 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.1 18.7 19.6 20.2 20.3 20.6 21.4 #[15] 21.8 21.8 23.2 26.2 26.8 29.7 p.list <- ppointsCensored(new.x, censored) p.list #$Order.Statistics # [1] 18.00000 18.00000 18.00000 18.00000 18.00000 18.00000 18.00000 18.09771 # [9] 18.65418 19.58594 20.21931 20.26851 20.55296 21.38869 21.76359 21.82364 #[17] 23.16804 26.16527 26.84336 29.67340 # #$Cumulative.Probabilities # [1] 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432 0.3765432 # [8] 0.3765432 0.4259259 0.4753086 0.5246914 0.5740741 0.6234568 0.6728395 #[15] 0.7222222 0.7716049 0.8209877 0.8703704 0.9197531 0.9691358 # #$Censored # [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE #[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE # #$Censoring.Side #[1] "left" # #$Prob.Method #[1] "michael-schucany" # #$Plot.Pos.Con #[1] 0.375 #---------- # Round off plotting positions to two decimal places # and compare to plotting positions that ignore censoring #-------------------------------------------------------- round(p.list$Cum, 2) # [1] 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.43 0.48 0.52 0.57 0.62 0.67 #[15] 0.72 0.77 0.82 0.87 0.92 0.97 round(ppoints(x, a=0.375), 2) # [1] 0.03 0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.43 0.48 0.52 0.57 0.62 0.67 #[15] 0.72 0.77 0.82 0.87 0.92 0.97 #---------- # Clean up #--------- rm(x, censored, new.x, p.list) #---------------------------------------------------------------------------- # Reproduce the example in Appendix B of Helsel and Cohn (1988). The data # are stored in Helsel.Cohn.88.appb.df. This data frame contains 18 # observations, of which 9 are censored below one of 2 distinct censoring # levels. Helsel.Cohn.88.app.b.df # Conc.orig Conc Censored #1 <1 1 TRUE #2 <1 1 TRUE #... #17 33 33 FALSE #18 50 50 FALSE p.list <- with(Helsel.Cohn.88.app.b.df, ppointsCensored(Conc, Censored, prob.method="hirsch-stedinger", plot.pos.con=0)) lapply(p.list[1:2], round, 3) #$Order.Statistics # [1] 1 1 1 1 1 1 3 7 9 10 10 10 12 15 20 27 33 50 # #$Cumulative.Probabilities # [1] 0.063 0.127 0.190 0.254 0.317 0.381 0.500 0.556 0.611 0.167 0.333 0.500 #[13] 0.714 0.762 0.810 0.857 0.905 0.952 # Clean up #--------- rm(p.list) #---------------------------------------------------------------------------- # Example 15-1 of USEPA (2009, page 15-10) gives an example of # computing plotting positions based on censored manganese # concentrations (ppb) in groundwater collected at 5 monitoring # wells. The data for this example are stored in # EPA.09.Ex.15.1.manganese.df. EPA.09.Ex.15.1.manganese.df # Sample Well Manganese.Orig.ppb Manganese.ppb Censored #1 1 Well.1 <5 5.0 TRUE #2 2 Well.1 12.1 12.1 FALSE #3 3 Well.1 16.9 16.9 FALSE #4 4 Well.1 21.6 21.6 FALSE #5 5 Well.1 <2 2.0 TRUE #... #21 1 Well.5 17.9 17.9 FALSE #22 2 Well.5 22.7 22.7 FALSE #23 3 Well.5 3.3 3.3 FALSE #24 4 Well.5 8.4 8.4 FALSE #25 5 Well.5 <2 2.0 TRUE p.list.EPA <- with(EPA.09.Ex.15.1.manganese.df, ppointsCensored(Manganese.ppb, Censored, prob.method = "kaplan-meier")) data.frame(Mn = p.list.EPA$Order.Statistics, Censored = p.list.EPA$Censored, CDF = p.list.EPA$Cumulative.Probabilities) # Mn Censored CDF #1 2.0 TRUE 0.21 #2 2.0 TRUE 0.21 #3 2.0 TRUE 0.21 #4 3.3 FALSE 0.28 #5 5.0 TRUE 0.28 #6 5.0 TRUE 0.28 #7 5.0 TRUE 0.28 #8 5.3 FALSE 0.32 #9 6.3 FALSE 0.36 #10 7.7 FALSE 0.40 #11 8.4 FALSE 0.44 #12 9.5 FALSE 0.48 #13 10.0 FALSE 0.52 #14 11.9 FALSE 0.56 #15 12.1 FALSE 0.60 #16 12.6 FALSE 0.64 #17 16.9 FALSE 0.68 #18 17.9 FALSE 0.72 #19 21.6 FALSE 0.76 #20 22.7 FALSE 0.80 #21 34.5 FALSE 0.84 #22 45.9 FALSE 0.88 #23 53.6 FALSE 0.92 #24 77.2 FALSE 0.96 #25 106.3 FALSE 1.00 #---------- # Clean up #--------- rm(p.list.EPA)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.