Estimate Quantiles of a Distribution Nonparametrically
Estimate quantiles of a distribution, and optionally create confidence intervals for them, without making any assumptions about the form of the distribution.
eqnpar(x, p = 0.5, type = 7, ci = FALSE, lcl.rank = NULL, ucl.rank = NULL, lb = -Inf, ub = Inf, ci.type = "two-sided", ci.method = "interpolate", digits = getOption("digits"), approx.conf.level = 0.95, min.coverage = TRUE, tol = 0)
x |
a numeric vector of observations. Missing ( |
p |
numeric vector of probabilities for which quantiles will be estimated.
All values of |
type |
an integer between 1 and 9 indicating which algorithm to use to estimate the
quantile. The default value is |
ci |
logical scalar indicating whether to compute a confidence interval for the quantile.
The default value is |
lcl.rank, ucl.rank |
positive integers indicating the ranks of the order statistics that are used
for the lower and upper bounds of the confidence interval for the specified
quantile. Both arguments must be integers between 1 and the number of non-missing
values in |
lb, ub |
scalars indicating lower and upper bounds on the distribution. By default, |
ci.type |
character string indicating what kind of confidence interval to compute.
The possible values are |
ci.method |
character string indicating the method to use to construct the confidence interval.
The possible values are |
digits |
an integer indicating the number of decimal places to round to when printing out
the value of |
approx.conf.level |
a scalar between 0 and 1 indicating the desired confidence level of the confidence
interval. The default value is |
min.coverage |
for the case when |
tol |
for the case when |
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Estimation
The function eqnpar
calls the R function quantile
to estimate quantiles.
Confidence Intervals
Let x_1, x_2, …, x_n denote a sample of n independent and
identically distributed random variables from some arbitrary distribution.
Furthermore, let x_{(i)} denote the i'th order statistic for these
n random variables. That is,
x_{(1)} ≤ x_{(2)} ≤ … ≤ x_{(n)} \;\;\;\;\;\; (1)
Finally, let x_p denote the p'th quantile of the distribution, that is:
Pr(X < x_p) ≤ p \;\;\;\;\;\; (2)
Pr(X ≤ x_p) ≥ p \;\;\;\;\;\; (3)
It can be shown (e.g., Conover, 1980, pp. 114-116) that for the i'th order statistic:
Pr[x_p < x_{(i)}] = F_{B(n,p)}[i-1]; \; i = 1, 2, …, n \;\;\;\;\;\; (4)
for a continuous distribution and
Pr[x_p < x_{(i)}] ≤ F_{B(n,p)}[i-1]; \; i = 1, 2, …, n \;\;\;\;\;\; (5)
Pr[x_p ≤ x_{(i)}] ≥ F_{B(n,p)}[i-1]; \; i = 1, 2, …, n \;\;\;\;\;\; (6)
for a discrete distribution,
where F_{B(n,p)}[y] denotes the cumulative distribution function of a
binomial random variable with parameters size=
n
and prob=
p evaluated at y. These facts are used to construct
confidence intervals for quantiles (see below).
Two-Sided Confidence Interval (ci.type="two-sided"
)
A two-sided nonparametric confidence interval for the p'th quantile is
constructed as:
[x_{(r)}, x_{(s)}] \;\;\;\;\;\; (7)
where
1 ≤ r ≤ (n-1) \;\;\;\;\;\; (8)
2 ≤ s ≤ n \;\;\;\;\;\; (9)
r < s \;\;\;\;\;\; (10)
Note that the argument lcl.rank
corresponds to r, and the argument
ucl.rank
corresponds to s.
This confidence interval has an associated confidence level that is at least as large as:
F_{B(n,p)}[s-1] - F_{B(n,p)}[r-1] \;\;\;\;\;\; (11)
for a discrete distribution and exactly equal to this value for a continuous distribution. This is because by Equations (4)-(6) above:
Pr[x_{(r)} ≤ x_p ≤ x_{(s)}]
= Pr[x_p ≤ x_{(s)}] - Pr[x_p < x_{(r)}]
≥ F_{B(n,p)}[s-1] - F_{B(n,p)}[r-1] \;\;\;\;\;\; (12)
with equality if the distribution is continuous.
Exact Method (ci.method="exact"
)
When lcl.rank
(r) and ucl.rank
(s) are not supplied by
the user, and ci.method="exact"
, r and s are initially chosen such that
r is the smallest integer satisfying equation (13) below, and s is
the largest integer satisfying equation (14) below:
F_{B(n,p)}[r-1] ≥ \frac{α}{2} \;\;\;\;\;\; (13)
F_{B(n,p)}[s-1] ≤ 1-\frac{α}{2} \;\;\;\;\;\; (14)
where α = 1 -approx.conf.level
.
The values of r and s are then each varied by \pm 2 (with the
restrictions r ≥ 1, s ≤ n, and r < s), and confidence levels
computed for each of these combinations. If min.coverage=TRUE
, the
combination of r and s is selected that provides the closest coverage
to approx.conf.level
, with coverage greater than or equal to
approx.conf.level
. If min.coverage=FALSE
, the
combination of r and s is selected that provides the closest coverage
to approx.conf.level
, with coverage less than or equal to
approx.conf.level + tol
.
For this method, the confidence level associated with the confidence interval
is exact if the underlying distribution is continuous.
Approximate Method (ci.method="approx"
)
Here the term “Approximate” simply refers to the method of initially
choosing the ranks for the lower and upper bounds. As for ci.method="exact"
,
the confidence level associated with the confidence interval is exact if the
underlying distribution is continuous.
When lcl.rank
(r) and ucl.rank
(s) are not supplied by
the user and ci.method="normal.approx"
, r and s are initially chosen
such that:
r = np - h \;\;\;\;\;\; (15)
s = np + h \;\;\;\;\;\; (16)
where
h = t_{n-1, 1-α/2} √{np(1-p)} \;\;\;\;\;\; (17)
and t_{ν,q} denotes the q'th quantile of
Student's t-distribution with ν degrees of freedom,
and α = 1 -approx.conf.level
(Conover, 1980, p. 112).
With the restictions that r ≥ 1 and s ≤ n,
r is rounded down to the nearest integer r = r^* = floor(r) and
s is rounded up to the nearest integer s = s^* = ceiling(s).
Again, with the restictions that s ≤ n,
if the confidence level using s = s^* + 1 and r = r^* is less than or equal
to approx.conf.level
, then s is set to s = s^* + 1.
Once this has been checked, with the restriction that r ≥ 1,
if the confidence level using the current value of s
and r = r^* - 1 is less than or equal to approx.conf.level
, then
r is set to r = r^* - 1.
Interpolate Method (ci.method="interpolate"
)
Let γ denote the desired confidence level associated with the
confidence interval for the p'th quantile.
Based on the work of Hettmansperger and Sheather (1986), Nyblom (1992) showed that
if [-∞, x_{(w+1)}] is a one-sided upper confidence interval for the
p'th quantile with associated confidence level γ_{w+1}, and
γ_{w+1} ≥ γ ≥ γ_{w}, then the one-sided upper
confidence interval
[-∞, (1-λ) x_{(w)} + λ x_{(w+1)}] \;\;\;\;\;\; (18)
where
λ = λ(β, p, w, n)
= [ 1 + \frac{w(1-p)(π_{w+1}-β)}{(n-w)p(β-π_w)} ] ^ {-1} \;\;\;\;\;\; (19)
π_w = F_{B(n,p)}[w-1] \;\;\;\;\;\; (20)
β = γ \;\;\;\;\;\; (21)
has associated confidence level approximately equal to γ for a wide range of distributions.
Thus, to construct an approximate two-sided confidence interval for the p'th quantile with confidence level γ, if [x_{(r)}, x_{(s)}] has confidence level ≥ γ and [x_{(r+1)}, x_{(s-1)}] has confidence level ≤ γ, then the lower bound of the two-sided confidence interval is computed as:
(1-λ) x_{(r)} + λ x_{(r+1)} \;\;\;\;\;\; (21)
where
β = α/2; \;\;\;\; α = 1 - γ \;\;\;\;\;\; (22)
and the upper bound of the two-sided confidence interval is computed as:
(1-λ) x_{(s-1)} + λ x_{(s)} \;\;\;\;\;\; (23)
where
β = 1 - α/2 \;\;\;\;\;\; (24)
The values of r and s in Equations (21) and (23) are computed by
using ci.method="exact"
with the argument min.coverage=TRUE
.
One-Sided Lower Confidence Interval (ci.type="lower"
)
A one-sided lower nonparametric confidence interval for the p'th quantile is
constructed as:
[x_{(r)}, ub] \;\;\;\;\;\; (25)
where ub denotes the value of the ub
argument (the user-supplied
upper bound).
Exact Method (ci.method="exact"
)
When lcl.rank
(r) is not supplied by the user, and
ci.method="exact"
, r is initially chosen such that it is the
smallest integer satisfying the following equation:
F_{B(n,p)}[r-1] ≥ α \;\;\;\;\;\; (26)
where α = 1 - approx.conf.level
.
The value of r is varied by \pm 2 (with the
restrictions r ≥ 1 and r ≤ n), and confidence levels
computed for each of these combinations. If min.coverage=TRUE
, the
value of r is selected that provides the closest coverage
to approx.conf.level
, with coverage greater than or equal to
approx.conf.level
. If min.coverage=FALSE
, the
value of r is selected that provides the closest coverage
to approx.conf.level
, with coverage less than or equal to
approx.conf.level + tol
.
For this method, the confidence level associated with the confidence interval
is exact if the underlying distribution is continuous.
Approximate Method (ci.method="approx"
)
When lcl.rank
(r) is not supplied by the user and
ci.method="normal.approx"
, r is initially chosen such that
r = np - t_{n-1, 1-α} √{np(1-p)} \;\;\;\;\;\; (27)
With the restriction that r ≥ 1 and r ≤ n,
if p
is less than 0.5 then r is rounded up to the nearest integer,
otherwise it is rounded down to the nearest integer. Denote this value by
r^*. With the restriction that r ≥ 1, if the confidence level using
r^* - 1 is less than or equal to approx.conf.level
, then
r is set to r = r^* - 1.
Interpolate Method (ci.method="interpolate"
)
Let γ denote the desired confidence level associated with the
confidence interval for the p'th quantile.
To construct an approximate one-sided lower confidence interval for the
p'th quantile with confidence level γ, if
[x_{(r)}, ub] has confidence level ≥ γ and
[x_{(r+1)}, ub] has confidence level ≤ γ, then
the lower bound of the confidence interval is computed as:
(1-λ) x_{(r)} + λ x_{(r+1)} \;\;\;\;\;\; (28)
where
β = α; \;\;\;\; α = 1 - γ \;\;\;\;\;\; (29)
The value of r in Equation (28) is computed by
using ci.method="exact"
with the arguments
ci.type="lower"
and min.coverage=TRUE
.
One-Sided Upper Confidence Interval (ci.type="upper"
)
A one-sided upper nonparametric confidence interval for the p'th quantile is
constructed as:
[lb, x_{(s)}] \;\;\;\;\;\; (30)
where lb denotes the value of the lb
argument (the user-supplied
lower bound).
Exact Method (ci.method="exact"
)
When ucl.rank
(s) is not supplied by the user, and
ci.method="exact"
, s is initially chosen such that it is the
largest integer satisfying the following equation:
F_{B(n,p)}[s-1] ≤ 1-α \;\;\;\;\;\; (31)
where α = 1 - approx.conf.level
.
The value of s is varied by \pm 2 (with the
restrictions s ≥ 1 and s ≤ n), and confidence levels
computed for each of these combinations. If min.coverage=TRUE
, the
value of s is selected that provides the closest coverage
to approx.conf.level
, with coverage greater than or equal to
approx.conf.level
. If min.coverage=FALSE
, the
value of s is selected that provides the closest coverage
to approx.conf.level
, with coverage less than or equal to
approx.conf.level + tol
.
For this method, the confidence level associated with the confidence interval
is exact if the underlying distribution is continuous.
Approximate Method (ci.method="approx"
)
When ucl.rank
(s) is not supplied by the user and
ci.method="normal.approx"
, s is initially chosen such that
s = np + t_{n-1, 1-α} √{np(1-p)} \;\;\;\;\;\; (31)
With the restriction that s ≥ 1 and s ≤ n,
if p
is greater than 0.5 then s is rounded down to the nearest integer,
otherwise it is rounded up to the nearest integer. Denote this value by
s^*. With the restriction that s ≤ n, if the confidence level using
s^* + 1 is less than or equal to approx.conf.level
, then
s is set to s = s^* + 1.
For this method, the confidence level associated with the confidence interval
is exact if the underlying distribution is continuous.
Interpolate Method (ci.method="interpolate"
)
Let γ denote the desired confidence level associated with the
confidence interval for the p'th quantile.
To construct an approximate one-sided upper confidence interval for the
p'th quantile with confidence level γ, if
[lb, x_{(s)}] has confidence level ≥ γ and
[lb, x_{(s-1)}] has confidence level ≤ γ, then
the upper bound of the confidence interval is computed as:
(1-λ) x_{(s-1)} + λ x_{(s)} \;\;\;\;\;\; (32)
where
β = γ \;\;\;\;\;\; (33)
The value of s in Equation (32) is computed by
using ci.method="exact"
with the arguments
ci.type = "upper"
, and min.coverage=TRUE
.
Note on Value of Confidence Level
Because of the discrete nature of order statistics, when ci.method="exact"
or ci.method="normal.approx"
, the value of the confidence level returned by
eqnpar
will usually differ from the desired confidence level indicated by
the value of the argument approx.conf.level
. When ci.method="interpolate"
,
eqnpar
returns for the confidence level the value of the argument
approx.conf.level
. Nyblom (1992) and Hettmasperger and Sheather (1986) have
shown that the Interpolate method produces confidence intervals with confidence levels
quite close to the assumed confidence level for a wide range of distributions.
a list of class "estimate"
containing the estimated quantile(s) and other
information. See estimate.object
for details.
Percentiles are sometimes used in environmental standards and regulations. For example, Berthouex and Brown (2002, p.71) note that England has water quality limits based on the 90th and 95th percentiles of monitoring data not exceeding specified levels. They also note that the U.S. EPA has specifications for air quality monitoring, aquatic standards on toxic chemicals, and maximum daily limits for industrial effluents that are all based on percentiles. Given the importance of these quantities, it is essential to characterize the amount of uncertainty associated with the estimates of these quantities. This is done with confidence intervals.
It can be shown (e.g., Conover, 1980, pp.119-121) that an upper confidence interval for the
p'th quantile with confidence level 100(1-α)\% is equivalent to an
upper β-content tolerance interval with coverage 100p\% and confidence level
100(1-α)\%. Also, a lower confidence interval for the p'th quantile with
confidence level 100(1-α)\% is equivalent to a lower β-content tolerance
interval with coverage 100(1-p)\% and confidence level 100(1-α)\%. See the
help file for tolIntNpar
for more information on nonparametric tolerance
intervals.
Steven P. Millard (EnvStats@ProbStatInfo.com)
The author is grateful to Michael H?hle,
Department of Mathematics, Stockholm University
(http://www2.math.su.se/~hoehle) for making me aware of the work of Nyblom (1992),
and for suggesting improvements to the algorithm that was used in EnvStats
Version 2.1.1 to construct a confidence interval when ci.method="exact"
.
Berthouex, P.M., and L.C. Brown. (2002). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, pp.132-136.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.88-90.
Hettmansperger, T.P., and Sheather, S.J. (1986). Confidence Intervals Based on Interpolated Order Statistics. Statistics & Probability Letters, 4, 75–79.
Nyblom, J. (1992). Note on Interpolated Order Statistics. Statistics & Probability Letters, 14, 129–131.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
# The data frame ACE.13.TCE.df contains observations on # Trichloroethylene (TCE) concentrations (mg/L) at # 10 groundwater monitoring wells before and after remediation. # # Compute the median concentration for each period along with # a 95% confidence interval for the median. # # Before remediation: 20.3 [8.8, 35.9] # After remediation: 2.5 [0.8, 5.9] with(ACE.13.TCE.df, eqnpar(TCE.mg.per.L[Period=="Before"], ci = TRUE)) #Results of Distribution Parameter Estimation #-------------------------------------------- #Assumed Distribution: None #Estimated Quantile(s): Median = 20.3 #Quantile Estimation Method: Nonparametric #Data: TCE.mg.per.L[Period == "Before"] #Sample Size: 10 #Confidence Interval for: 50'th %ile #Confidence Interval Method: interpolate (Nyblom, 1992) #Confidence Interval Type: two-sided #Confidence Level: 95% #Confidence Limit Rank(s): 2 9 # 3 8 #Confidence Interval: LCL = 8.804775 # UCL = 35.874775 #---------- with(ACE.13.TCE.df, eqnpar(TCE.mg.per.L[Period=="After"], ci = TRUE)) #Results of Distribution Parameter Estimation #-------------------------------------------- #Assumed Distribution: None #Estimated Quantile(s): Median = 2.48 #Quantile Estimation Method: Nonparametric #Data: TCE.mg.per.L[Period == "After"] #Sample Size: 10 #Confidence Interval for: 50'th %ile #Confidence Interval Method: interpolate (Nyblom, 1992) #Confidence Interval Type: two-sided #Confidence Level: 95% #Confidence Limit Rank(s): 2 9 # 3 8 #Confidence Interval: LCL = 0.7810901 # UCL = 5.8763063 #========== # Generate 20 observations from a cauchy distribution with parameters # location=0, scale=1. The true 75th percentile of this distribution is 1. # Use eqnpar to estimate the 75th percentile and construct a 90% confidence interval. # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(250) dat <- rcauchy(20, location = 0, scale = 1) #------------------------------------------------------- # First, use the default method, ci.method="interpolate" #------------------------------------------------------- eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 75'th %ile = 1.524903 # #Quantile Estimation Method: Nonparametric # #Data: dat # #Sample Size: 20 # #Confidence Interval for: 75'th %ile # #Confidence Interval Method: interpolate (Nyblom, 1992) # #Confidence Interval Type: two-sided # #Confidence Level: 90% # #Confidence Limit Rank(s): 12 19 # 13 18 # #Confidence Interval: LCL = 0.8191423 # UCL = 2.1215570 #---------- #------------------------------------------------------------- # Now use ci.method="exact". # Note that the returned confidence level is greater than 90%. #------------------------------------------------------------- eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9, ci.method = "exact") #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 75'th %ile = 1.524903 # #Quantile Estimation Method: Nonparametric # #Data: dat # #Sample Size: 20 # #Confidence Interval for: 75'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: two-sided # #Confidence Level: 93.47622% # #Confidence Limit Rank(s): 12 19 # #Confidence Interval: LCL = 0.7494692 # UCL = 2.2156601 #---------- #---------------------------------------------------------- # Now use ci.method="exact" with min.coverage=FALSE. # Note that the returned confidence level is less than 90%. #---------------------------------------------------------- eqnpar(dat, p = 0.75, ci = TRUE, approx.conf.level = 0.9, ci.method = "exact", min.coverage = FALSE, ) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 75'th %ile = 1.524903 # #Quantile Estimation Method: Nonparametric # #Data: dat # #Sample Size: 20 # #Confidence Interval for: 75'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: two-sided # #Confidence Level: 89.50169% # #Confidence Limit Rank(s): 13 20 # #Confidence Interval: LCL = 1.018038 # UCL = 5.002399 #---------- #----------------------------------------------------------- # Now supply our own bounds for the confidence interval. # The first example above based on the Interpolate method # used lcl.rank=12, ucl.rank=19 and lcl.rank=13, ucl.rank=18 # and interpolated between these two confidence intervals. # Here we will specify lcl.rank=13 and ucl.rank=18. The # resulting confidence level is 81%. #----------------------------------------------------------- eqnpar(dat, p = 0.75, ci = TRUE, lcl.rank = 13, ucl.rank = 18) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 75'th %ile = 1.524903 # #Quantile Estimation Method: Nonparametric # #Data: dat # #Sample Size: 20 # #Confidence Interval for: 75'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: two-sided # #Confidence Level: 80.69277% # #Confidence Limit Rank(s): 13 18 # #Confidence Interval: LCL = 1.018038 # UCL = 2.071172 #---------- # Clean up rm(dat) #========== # Modify Example 17-4 on page 17-21 of USEPA (2009). This example uses # copper concentrations (ppb) from 3 background wells to set an upper # limit for 2 compliance wells. Here we will attempt to compute an upper # 95% confidence interval for the 95'th percentile of the distribution of # copper concentrations in the background wells. # # The data are stored in EPA.92c.copper2.df. # # Note that even though these data are Type I left singly censored, # it is still possible to compute an estimate of the 95'th percentile. EPA.92c.copper2.df # Copper.orig Copper Censored Month Well Well.type #1 <5 5.0 TRUE 1 1 Background #2 <5 5.0 TRUE 2 1 Background #3 7.5 7.5 FALSE 3 1 Background #... #9 9.2 9.2 FALSE 1 2 Background #10 <5 5.0 TRUE 2 2 Background #11 <5 5.0 TRUE 3 2 Background #... #17 <5 5.0 TRUE 1 3 Background #18 5.4 5.4 FALSE 2 3 Background #19 6.7 6.7 FALSE 3 3 Background #... #29 6.2 6.2 FALSE 5 4 Compliance #30 <5 5.0 TRUE 6 4 Compliance #31 7.8 7.8 FALSE 7 4 Compliance #... #38 <5 5.0 TRUE 6 5 Compliance #39 5.6 5.6 FALSE 7 5 Compliance #40 <5 5.0 TRUE 8 5 Compliance # Because of the small sample size of n=24 observations, it is not possible # to create a nonparametric confidence interval for the 95th percentile # that has an associated confidence level of 95%. If we tried to do this, # we would get an error message: # with(EPA.92c.copper2.df, # eqnpar(Copper[Well.type=="Background"], p = 0.95, ci = TRUE, lb = 0, # ci.type = "upper", approx.conf.level = 0.95)) # #Error in ci.qnpar.interpolate(x = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, : # Minimum coverage of 0.95 is not possible with the given sample size. # So instead, we will use ci.method="exact" with min.coverage=FALSE # to construct the confidence interval. Note that the associated # confidence level is only 71%. with(EPA.92c.copper2.df, eqnpar(Copper[Well.type=="Background"], p = 0.95, ci = TRUE, ci.method = "exact", min.coverage = FALSE, ci.type = "upper", lb = 0, approx.conf.level = 0.95)) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 95'th %ile = 7.925 # #Quantile Estimation Method: Nonparametric # #Data: Copper[Well.type == "Background"] # #Sample Size: 24 # #Confidence Interval for: 95'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: upper # #Confidence Level: 70.8011% # #Confidence Limit Rank(s): NA 24 # #Confidence Interval: LCL = 0.0 # UCL = 9.2 #---------- # For the above example, the true confidence level is 71% instead of 95%. # This is a function of the small sample size. In fact, as Example 17-4 on # pages 17-21 of USEPA (2009) shows, the largest quantile for which you can # construct a nonparametric confidence interval that will have associated # confidence level of 95% is the 88'th percentile: with(EPA.92c.copper2.df, eqnpar(Copper[Well.type=="Background"], p = 0.88, ci = TRUE, ci.type = "upper", lb = 0, ucl.rank = 24, approx.conf.level = 0.95)) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 88'th %ile = 6.892 # #Quantile Estimation Method: Nonparametric # #Data: Copper[Well.type == "Background"] # #Sample Size: 24 # #Confidence Interval for: 88'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: upper # #Confidence Level: 95.3486% # #Confidence Limit Rank(s): NA 24 # #Confidence Interval: LCL = 0.0 # UCL = 9.2 #========== # Reproduce Example 21-6 on pages 21-21 to 21-22 of USEPA (2009). # Use 12 measurements of nitrate (mg/L) at a well used for drinking water # to determine with 95% confidence whether or not the infant-based, acute # risk standard of 10 mg/L has been violated. Assume that the risk # standard represents an upper 95'th percentile limit on nitrate # concentrations. So what we need to do is construct a one-sided # lower nonparametric confidence interval for the 95'th percentile # that has associated confidence level of no more than 95%, and we will # compare the lower confidence limit with the MCL of 10 mg/L. # # The data for this example are stored in EPA.09.Ex.21.6.nitrate.df. # Look at the data: #------------------ EPA.09.Ex.21.6.nitrate.df # Sampling.Date Date Nitrate.mg.per.l.orig Nitrate.mg.per.l Censored #1 7/28/1999 1999-07-28 <5.0 5.0 TRUE #2 9/3/1999 1999-09-03 12.3 12.3 FALSE #3 11/24/1999 1999-11-24 <5.0 5.0 TRUE #4 5/3/2000 2000-05-03 <5.0 5.0 TRUE #5 7/14/2000 2000-07-14 8.1 8.1 FALSE #6 10/31/2000 2000-10-31 <5.0 5.0 TRUE #7 12/14/2000 2000-12-14 11 11.0 FALSE #8 3/27/2001 2001-03-27 35.1 35.1 FALSE #9 6/13/2001 2001-06-13 <5.0 5.0 TRUE #10 9/16/2001 2001-09-16 <5.0 5.0 TRUE #11 11/26/2001 2001-11-26 9.3 9.3 FALSE #12 3/2/2002 2002-03-02 10.3 10.3 FALSE # Determine what order statistic to use for the lower confidence limit # in order to achieve no more than 95% confidence. #--------------------------------------------------------------------- conf.levels <- ciNparConfLevel(n = 12, p = 0.95, lcl.rank = 1:12, ci.type = "lower") names(conf.levels) <- 1:12 round(conf.levels, 2) # 1 2 3 4 5 6 7 8 9 10 11 12 #1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98 0.88 0.54 # Using the 11'th largest observation for the lower confidence limit # yields a confidence level of 88%. Using the 10'th largest # observation yields a confidence level of 98%. The example in # USEPA (2009) uses the 10'th largest observation. # # The 10'th largest observation is 11 mg/L which exceeds the # MCL of 10 mg/L, so there is evidence of contamination. #-------------------------------------------------------------------- with(EPA.09.Ex.21.6.nitrate.df, eqnpar(Nitrate.mg.per.l, p = 0.95, ci = TRUE, ci.type = "lower", lcl.rank = 10)) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Estimated Quantile(s): 95'th %ile = 22.56 # #Quantile Estimation Method: Nonparametric # #Data: Nitrate.mg.per.l # #Sample Size: 12 # #Confidence Interval for: 95'th %ile # #Confidence Interval Method: exact # #Confidence Interval Type: lower # #Confidence Level: 98.04317% # #Confidence Limit Rank(s): 10 NA # #Confidence Interval: LCL = 11 # UCL = Inf #========== # Clean up #--------- rm(conf.levels)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.