Nonparametric Tolerance Interval for a Continuous Distribution
Construct a β-content or β-expectation tolerance interval nonparametrically without making any assumptions about the form of the distribution except that it is continuous.
tolIntNpar(x, coverage, conf.level, cov.type = "content", ltl.rank = ifelse(ti.type == "upper", 0, 1), n.plus.one.minus.utl.rank = ifelse(ti.type == "lower", 0, 1), lb = -Inf, ub = Inf, ti.type = "two-sided")
x |
numeric vector of observations. Missing ( |
coverage |
a scalar between 0 and 1 indicating the desired coverage of the β-content
tolerance interval.
The default value is |
conf.level |
a scalar between 0 and 1 indicating the confidence level associated with the β-content
tolerance interval. The default value is |
cov.type |
character string specifying the coverage type for the tolerance interval.
The possible values are |
ltl.rank |
positive integer indicating the rank of the order statistic to use for the lower bound
of the tolerance interval. If |
n.plus.one.minus.utl.rank |
positive integer related to the rank of the order statistic to use for
the upper bound of the toleracne interval. A value of
|
lb, ub |
scalars indicating lower and upper bounds on the distribution. By default, |
ti.type |
character string indicating what kind of tolerance interval to compute.
The possible values are |
A tolerance interval for some population is an interval on the real line constructed so as to contain 100 β \% of the population (i.e., 100 β \% of all future observations), where 0 < β < 1. The quantity 100 β \% is called the coverage.
There are two kinds of tolerance intervals (Guttman, 1970):
A β-content tolerance interval with confidence level 100(1-α)\% is constructed so that it contains at least 100 β \% of the population (i.e., the coverage is at least 100 β \%) with probability 100(1-α)\%, where 0 < α < 1. The quantity 100(1-α)\% is called the confidence level or confidence coefficient associated with the tolerance interval.
A β-expectation tolerance interval is constructed so that the average coverage of the interval is 100 β \%.
Note: A β-expectation tolerance interval with coverage 100 β \% is
equivalent to a prediction interval for one future observation with associated confidence level
100 β \%. Note that there is no explicit confidence level associated with a
β-expectation tolerance interval. If a β-expectation tolerance interval is
treated as a β-content tolerance interval, the confidence level associated with this
tolerance interval is usually around 50% (e.g., Guttman, 1970, Table 4.2, p.76).
The Form of a Nonparametric Tolerance Interval
Let \underline{x} denote a random sample of n independent observations
from some continuous distribution and let x_{(i)} denote the i'th order
statistic in \underline{x}. A two-sided nonparametric tolerance interval is
constructed as:
[x_{(u)}, x_{(v)}] \;\;\;\;\;\; (1)
where u and v are positive integers between 1 and n, and u < v. That is, u denotes the rank of the lower tolerance limit, and v denotes the rank of the upper tolerance limit. To make it easier to write some equations later on, we can also write the tolerance interval (1) in a slightly different way as:
[x_{(u)}, x_{(n+1-w)}] \;\;\;\;\;\; (2)
where
w = n + 1 - v \;\;\;\;\;\; (3)
so that w is a positive integer between 1 and n-1, and u < n+1-w.
In terms of the arguments to the function tolIntNpar
, the argument
ltl.rank
corresponds to u, and the argument n.plus.one.minus.utl.rank
corresponds to w.
If we allow u=0 and w=0 and define lower and upper bounds as:
x_{(0)} = lb \;\;\;\;\;\; (4)
x_{(n+1)} = ub \;\;\;\;\;\; (5)
then equation (2) above can also represent a one-sided lower or one-sided upper tolerance interval as well. That is, a one-sided lower nonparametric tolerance interval is constructed as:
[x_{(u)}, x_{(n+1)}] = [x_{(u)}, ub] \;\;\;\;\;\; (6)
and a one-sided upper nonparametric tolerance interval is constructed as:
[x_{(0)}, x_{(v)}] = [lb, x_{(v)}] \;\;\;\;\;\; (7)
Usually, lb = -∞ or lb = 0 and ub = ∞.
Let C be a random variable denoting the coverage of the above nonparametric
tolerance intervals. Wilks (1941) showed that the distribution of C follows a
beta distribution with parameters shape1=
v-u and
shape2=
w+u when the unknown distribution is continuous.
Computations for a β-Content Tolerance Interval
For a β-content tolerance interval, if the coverage C = β is specified,
then the associated confidence level (1-α)100\% is computed as:
1 - α = 1 - F(β, v-u, w+u) \;\;\;\;\;\; (8)
where F(y, δ, γ) denotes the cumulative distribution function of a
beta random variable with parameters shape1=
δ and
shape2=
γ evaluated at y.
Similarly, if the confidence level associated with the tolerance interval is specified as (1-α)100\%, then the coverage C = β is computed as:
β = B(α, v-u, w+u) \;\;\;\;\;\; (9)
where B(p, δ, γ) denotes the p'th quantile of a
beta distribution with parameters shape1=
δ
and shape2=
γ.
Computations for a β-Expectation Tolerance Interval
For a β-expectation tolerance interval, the expected coverage is simply
the mean of a beta random variable with parameters
shape1=
v-u and shape2=
w+u, which is given by:
E(C) = \frac{v-u}{n+1} \;\;\;\;\;\; (10)
As stated above, a β-expectation tolerance interval with coverage
β 100\% is equivalent to a prediction interval for one future observation
with associated confidence level β 100\%. This is because the probability
that any single future observation will fall into this interval is β 100\%,
so the distribution of the number of N future observations that will fall into
this interval is binomial with parameters size=
N
and prob=
β. Hence the expected proportion of future observations
that fall into this interval is β 100\% and is independent of the value of N.
See the help file for predIntNpar
for more information on constructing
a nonparametric prediction interval.
A list of class "estimate"
containing the estimated parameters,
the tolerance interval, and other information. See estimate.object
for details.
Tolerance intervals have long been applied to quality control and life testing problems (Hahn, 1970b,c; Hahn and Meeker, 1991; Krishnamoorthy and Mathew, 2009). References that discuss tolerance intervals in the context of environmental monitoring include: Berthouex and Brown (2002, Chapter 21), Gibbons et al. (2009), Millard and Neerchal (2001, Chapter 6), Singh et al. (2010b), and USEPA (2009).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York.
Danziger, L., and S. Davis. (1964). Tables of Distribution-Free Tolerance Limits. Annals of Mathematical Statistics 35(5), 1361–1365.
Davis, C.B. (1994). Environmental Regulatory Statistics. In Patil, G.P., and C.R. Rao, eds., Handbook of Statistics, Vol. 12: Environmental Statistics. North-Holland, Amsterdam, a division of Elsevier, New York, NY, Chapter 26, 817–865.
Davis, C.B., and R.J. McNichols. (1994a). Ground Water Monitoring Statistics Update: Part I: Progress Since 1988. Ground Water Monitoring and Remediation 14(4), 148–158.
Gibbons, R.D. (1991b). Statistical Tolerance Limits for Ground-Water Monitoring. Ground Water 29, 563–570.
Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.
Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Hafner Publishing Co., Darien, CT, Chapter 2.
Hahn, G.J., and W.Q. Meeker. (1991). Statistical Intervals: A Guide for Practitioners. John Wiley and Sons, New York, 392pp.
Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, New York, NY, pp.88-90.
Krishnamoorthy K., and T. Mathew. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons, Hoboken.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
Wilks, S.S. (1941). Determination of Sample Sizes for Setting Tolerance Limits. Annals of Mathematical Statistics 12, 91–96.
# Generate 20 observations from a lognormal mixture distribution # with parameters mean1=1, cv1=0.5, mean2=5, cv2=1, and p.mix=0.1. # The exact two-sided interval that contains 90% of this distribution is given by: # [0.682312, 13.32052]. Use tolIntNpar to construct a two-sided 90% # \eqn{\beta}-content tolerance interval. Note that the associated confidence level # is only 61%. A larger sample size is required to obtain a larger confidence # level (see the help file for tolIntNparN). # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(23) dat <- rlnormMixAlt(20, 1, 0.5, 5, 1, 0.1) tolIntNpar(dat, coverage = 0.9) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: dat # #Sample Size: 20 # #Tolerance Interval Coverage: 90% # #Coverage Type: content # #Tolerance Interval Method: Exact # #Tolerance Interval Type: two-sided # #Confidence Level: 60.8253% # #Tolerance Limit Rank(s): 1 20 # #Tolerance Interval: LTL = 0.5035035 # UTL = 9.9504662 #---------- # Clean up rm(dat) #---------- # Reproduce Example 17-4 on page 17-21 of USEPA (2009). This example uses # copper concentrations (ppb) from 3 background wells to set an upper # limit for 2 compliance wells. The maximum value from the 3 wells is set # to the 95% confidence upper tolerance limit, and we need to determine the # coverage of this tolerance interval. The data are stored in EPA.92c.copper2.df. # Note that even though these data are Type I left singly censored, it is still # possible to compute an upper tolerance interval using any of the uncensored # observations as the upper limit. EPA.92c.copper2.df # Copper.orig Copper Censored Month Well Well.type #1 <5 5.0 TRUE 1 1 Background #2 <5 5.0 TRUE 2 1 Background #3 7.5 7.5 FALSE 3 1 Background #... #9 9.2 9.2 FALSE 1 2 Background #10 <5 5.0 TRUE 2 2 Background #11 <5 5.0 TRUE 3 2 Background #... #17 <5 5.0 TRUE 1 3 Background #18 5.4 5.4 FALSE 2 3 Background #19 6.7 6.7 FALSE 3 3 Background #... #29 6.2 6.2 FALSE 5 4 Compliance #30 <5 5.0 TRUE 6 4 Compliance #31 7.8 7.8 FALSE 7 4 Compliance #... #38 <5 5.0 TRUE 6 5 Compliance #39 5.6 5.6 FALSE 7 5 Compliance #40 <5 5.0 TRUE 8 5 Compliance with(EPA.92c.copper2.df, tolIntNpar(Copper[Well.type=="Background"], conf.level = 0.95, lb = 0, ti.type = "upper")) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: Copper[Well.type == "Background"] # #Sample Size: 24 # #Tolerance Interval Coverage: 88.26538% # #Coverage Type: content # #Tolerance Interval Method: Exact # #Tolerance Interval Type: upper # #Confidence Level: 95% # #Tolerance Limit Rank(s): 24 # #Tolerance Interval: LTL = 0.0 # UTL = 9.2 #---------- # Repeat the last example, except compute an upper # \eqn{\beta}-expectation tolerance interval: with(EPA.92c.copper2.df, tolIntNpar(Copper[Well.type=="Background"], cov.type = "expectation", lb = 0, ti.type = "upper")) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: None # #Data: Copper[Well.type == "Background"] # #Sample Size: 24 # #Tolerance Interval Coverage: 96% # #Coverage Type: expectation # #Tolerance Interval Method: Exact # #Tolerance Interval Type: upper # #Tolerance Limit Rank(s): 24 # #Tolerance Interval: LTL = 0.0 # UTL = 9.2
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.