Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

predIntLnormAltSimultaneousTestPower

Probability That at Least One Set of Future Observations Violates the Given Rule Based on a Simultaneous Prediction Interval for a Lognormal Distribution


Description

Compute the probability that at least one set of future observations violates the given rule based on a simultaneous prediction interval for the next r future sampling occasions for a lognormal distribution. The three possible rules are: k-of-m, California, or Modified California.

Usage

predIntLnormAltSimultaneousTestPower(n, df = n - 1, n.geomean = 1, k = 1, 
    m = 2, r = 1, rule = "k.of.m", ratio.of.means = 1, cv = 1, pi.type = "upper", 
    conf.level = 0.95, r.shifted = r, K.tol = .Machine$double.eps^0.5, 
    integrate.args.list = NULL)

Arguments

n

vector of positive integers greater than 2 indicating the sample size upon which the prediction interval is based.

df

vector of positive integers indicating the degrees of freedom associated with the sample size. The default value is df=n-1.

n.geomean

positive integer specifying the sample size associated with the future geometric means. The default value is n.geomean=1 (i.e., individual observations). Note that all future geometric means must be based on the same sample size.

k

for the k-of-m rule (rule="k.of.m"), vector of positive integers specifying the minimum number of observations (or averages) out of m observations (or averages) (all obtained on one future sampling “occassion”) the prediction interval should contain with confidence level conf.level. The default value is k=1. This argument is ignored when the argument rule is not equal to "k.of.m".

m

vector of positive integers specifying the maximum number of future observations (or averages) on one future sampling “occasion”. The default value is m=2, except when rule="Modified.CA", in which case this argument is ignored and m is automatically set equal to 4.

r

vector of positive integers specifying the number of future sampling “occasions”. The default value is r=1.

rule

character string specifying which rule to use. The possible values are "k.of.m" (k-of-m rule; the default), "CA" (California rule), and "Modified.CA" (modified California rule). See the DETAILS section below for more information.

ratio.of.means

numeric vector specifying the ratio of the mean of the population that will be sampled to produce the future observations vs. the mean of the population that was sampled to construct the prediction interval. See the DETAILS section below for more information. The default value is ratio.of.means=1.

cv

numeric vector of positive values specifying the coefficient of variation for both the population that was sampled to construct the prediction interval and the population that will be sampled to produce the future observations. The default value is cv=1.

pi.type

character string indicating what kind of prediction interval to compute. The possible values are pi.type="upper" (the default), and pi.type="lower".

conf.level

vector of values between 0 and 1 indicating the confidence level of the prediction interval. The default value is conf.level=0.95.

r.shifted

vector of positive integers specifying the number of future sampling occasions for which the mean is shifted. All values must be integeters between 1 and the corresponding element of r. The default value is r.shifted=r.

K.tol

numeric scalar indicating the tolerance to use in the nonlinear search algorithm to compute K. The default value is K.tol=.Machine$double.eps^(1/2). For many applications, the value of K needs to be known only to the second decimal place, in which case setting K.tol=1e-4 will speed up computation a bit.

integrate.args.list

a list of arguments to supply to the integrate function. The default value is integrate.args.list=NULL which means that the default values of integrate are used.

Details

What is a Simultaneous Prediction Interval?
A prediction interval for some population is an interval on the real line constructed so that it will contain k future observations from that population with some specified probability (1-α)100\%, where 0 < α < 1 and k is some pre-specified positive integer. The quantity (1-α)100\% is called the confidence coefficient or confidence level associated with the prediction interval. The function predIntNorm computes a standard prediction interval based on a sample from a normal distribution.

The function predIntLnormAltSimultaneous computes a simultaneous prediction interval (assuming lognormal observations) that will contain a certain number of future observations with probability (1-α)100\% for each of r future sampling “occasions”, where r is some pre-specified positive integer. The quantity r may refer to r distinct future sampling occasions in time, or it may for example refer to sampling at r distinct locations on one future sampling occasion, assuming that the population standard deviation is the same at all of the r distinct locations.

The function predIntLnormAltSimultaneous computes a simultaneous prediction interval based on one of three possible rules:

  • For the k-of-m rule (rule="k.of.m"), at least k of the next m future observations will fall in the prediction interval with probability (1-α)100\% on each of the r future sampling occasions. If obserations are being taken sequentially, for a particular sampling occasion, up to m observations may be taken, but once k of the observations fall within the prediction interval, sampling can stop. Note: When k=m and r=1, the results of predIntNormSimultaneous are equivalent to the results of predIntNorm.

  • For the California rule (rule="CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else all of the next m-1 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, m-1 more observations must be taken.

  • For the Modified California rule (rule="Modified.CA"), with probability (1-α)100\%, for each of the r future sampling occasions, either the first observation will fall in the prediction interval, or else at least 2 out of the next 3 observations will fall in the prediction interval. That is, if the first observation falls in the prediction interval then sampling can stop. Otherwise, up to 3 more observations must be taken.

Computing Power
The function predIntNormSimultaneousTestPower computes the probability that at least one set of future observations or averages will violate the given rule based on a simultaneous prediction interval for the next r future sampling occasions for a normal distribution, based on the assumption of normally distributed observations, where the population mean for the future observations is allowed to differ from the population mean for the observations used to construct the prediction interval.

The function predIntLnormAltSimultaneousTestPower assumes all observations are from a lognormal distribution. The observations used to construct the prediction interval are assumed to come from a lognormal distribution with mean θ_2 and coefficient of variation τ. The future observations are assumed to come from a lognormal distribution with mean θ_1 and coefficient of variation τ; that is, the means are allowed to differ between the two populations, but not the coefficient of variation.

The function predIntLnormAltSimultaneousTestPower calls the function
predIntNormSimultaneousTestPower, with the argument delta.over.sigma given by:

\frac{δ}{σ} = \frac{log(R)}{√{log(τ^2 + 1)}} \;\;\;\;\;\; (1)

where R is given by:

R = \frac{θ_1}{θ_2} \;\;\;\;\;\; (2)

and corresponds to the argument ratio.of.means for the function
predIntLnormAltSimultaneousTestPower, and τ corresponds to the argument cv.

Value

vector of values between 0 and 1 equal to the probability that the rule will be violated.

Note

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

See the help file for predIntLnormAltSimultaneous.

See Also

Examples

# For the k-of-m rule with n=4, k=1, m=3, and r=1, show how the power increases 
  # as ratio.of.means increases.  Assume a 95% upper prediction interval.

  predIntLnormAltSimultaneousTestPower(n = 4, m = 3, ratio.of.means = 1:3) 
  #[1] 0.0500000 0.2356914 0.4236723

  #----------

  # Look at how the power increases with sample size for an upper one-sided 
  # prediction interval using the k-of-m rule with k=1, m=3, r=20, 
  # ratio.of.means=4, and a confidence level of 95%.

  predIntLnormAltSimultaneousTestPower(n = c(4, 8), m = 3, r = 20, ratio.of.means = 4) 
  #[1] 0.4915743 0.8218175

  #----------

  # Compare the power for the 1-of-3 rule with the power for the California and 
  # Modified California rules, based on a 95% upper prediction interval and 
  # ratio.of.means=4.  Assume a sample size of n=8.  Note that in this case the 
  # power for the Modified California rule is greater than the power for the 
  # 1-of-3 rule and California rule.

  predIntLnormAltSimultaneousTestPower(n = 8, k = 1, m = 3, ratio.of.means = 4) 
  #[1] 0.6594845 

  predIntLnormAltSimultaneousTestPower(n = 8, m = 3, rule = "CA", ratio.of.means = 4) 
  #[1] 0.5864311 

  predIntLnormAltSimultaneousTestPower(n = 8, rule = "Modified.CA", ratio.of.means = 4) 
  #[1] 0.691135

  #----------

  # Show how the power for an upper 95% simultaneous prediction limit increases 
  # as the number of future sampling occasions r increases.  Here, we'll use the 
  # 1-of-3 rule with n=8 and ratio.of.means=4.

  predIntLnormAltSimultaneousTestPower(n = 8, k = 1, m = 3, r = c(1, 2, 5, 10), 
    ratio.of.means = 4) 
  #[1] 0.6594845 0.7529576 0.8180814 0.8302302

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

v2.4.0
GPL (>= 3)
Authors
Steven P. Millard [aut], Alexander Kowarik [ctb, cre] (<https://orcid.org/0000-0001-8598-4130>)
Initial release
2020-10-20

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.