Probability That at Least One Future Observation Falls Outside a Prediction Interval for a Normal Distribution
Compute the probability that at least one out of k future observations (or means) falls outside a prediction interval for k future observations (or means) for a normal distribution.
predIntNormTestPower(n, df = n - 1, n.mean = 1, k = 1, delta.over.sigma = 0, pi.type = "upper", conf.level = 0.95)
n |
vector of positive integers greater than 2 indicating the sample size upon which the prediction interval is based. |
df |
vector of positive integers indicating the degrees of freedom associated with
the sample size. The default value is |
n.mean |
positive integer specifying the sample size associated with the future averages.
The default value is |
k |
vector of positive integers specifying the number of future observations that the
prediction interval should contain with confidence level |
delta.over.sigma |
vector of numbers indicating the ratio Δ/σ. The quantity
Δ (delta) denotes the difference between the mean of the population
that was sampled to construct the prediction interval, and the mean of the
population that will be sampled to produce the future observations. The quantity
σ (sigma) denotes the population standard deviation for both populations.
See the DETAILS section below for more information. The default value is
|
pi.type |
character string indicating what kind of prediction interval to compute.
The possible values are |
conf.level |
numeric vector of values between 0 and 1 indicating the confidence level of the
prediction interval. The default value is |
What is a Prediction Interval?
A prediction interval for some population is an interval on the real line
constructed so that it will contain k future observations or averages
from that population with some specified probability (1-α)100\%,
where 0 < α < 1 and k is some pre-specified positive integer.
The quantity (1-α)100\% is call the confidence coefficient or
confidence level associated with the prediction interval. The function
predIntNorm
computes a standard prediction interval based on a
sample from a normal distribution. The function predIntNormTestPower
computes the probability that at least one out of k future observations or
averages will not be contained in the prediction interval,
where the population mean for the future observations is allowed to differ from
the population mean for the observations used to construct the prediction interval.
The Form of a Prediction Interval
Let \underline{x} = x_1, x_2, …, x_n denote a vector of n
observations from a normal distribution with parameters
mean=
μ and sd=
σ. Also, let m denote the
sample size associated with the k future averages (i.e., n.mean=
m).
When m=1, each average is really just a single observation, so in the rest of
this help file the term “averages” will replace the phrase
“observations or averages”.
For a normal distribution, the form of a two-sided (1-α)100\% prediction interval is:
[\bar{x} - Ks, \bar{x} + Ks] \;\;\;\;\;\; (1)
where \bar{x} denotes the sample mean:
\bar{x} = \frac{1}{n} ∑_{i=1}^n x_i \;\;\;\;\;\; (2)
s denotes the sample standard deviation:
s^2 = \frac{1}{n-1} ∑_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (3)
and K denotes a constant that depends on the sample size n, the
confidence level, the number of future averages k, and the
sample size associated with the future averages, m. Do not confuse the
constant K (uppercase K) with the number of future averages k
(lowercase k). The symbol K is used here to be consistent with the
notation used for tolerance intervals (see tolIntNorm
).
Similarly, the form of a one-sided lower prediction interval is:
[\bar{x} - Ks, ∞] \;\;\;\;\;\; (4)
and the form of a one-sided upper prediction interval is:
[-∞, \bar{x} + Ks] \;\;\;\;\;\; (5)
but K differs for one-sided versus two-sided prediction intervals.
The derivation of the constant K is explained in the help file for
predIntNormK
.
Computing Power
The "power" of the prediction interval is defined as the probability that at
least one out of the k future observations or averages
will not be contained in the prediction interval, where the population mean
for the future observations is allowed to differ from the population mean for the
observations used to construct the prediction interval. The probability p
that all k future observations will be contained in a one-sided upper
prediction interval (pi.type="upper"
) is given in Equation (6) of the help
file for
predIntNormSimultaneousK
, where k=m and r=1:
p = \int_0^1 T(√{n}K; n-1, √{n}[Φ^{-1}(v) + \frac{Δ}{σ}]) [\frac{v^{k-1}}{B(k, 1)}] dv \;\;\;\;\;\; (6)
where T(x; ν, δ) denotes the cdf of the
non-central Student's t-distribution with parameters
df=
ν and ncp=
δ evaluated at x;
Φ(x) denotes the cdf of the standard normal distribution
evaluated at x; and B(ν, ω) denotes the value of the
beta function with parameters a=
ν and
b=
ω.
The quantity Δ (upper case delta) denotes the difference between the mean of the population that was sampled to construct the prediction interval, and the mean of the population that will be sampled to produce the future observations. The quantity σ (sigma) denotes the population standard deviation of both of these populations. Usually you assume Δ=0 unless you are interested in computing the power of the rule to detect a change in means between the populations, as we are here.
If we are interested in using averages instead of single observations, with
w ≥ 1 (i.e., n.mean
≥ 1), the first
term in the integral in Equation (6) that involves the cdf of the
non-central Student's t-distribution becomes:
T(√{n}K; n-1, \frac{√{n}}{√{w}}[Φ^{-1}(v) + \frac{√{w}Δ}{σ}]) \;\;\;\;\;\; (7)
For a given confidence level (1-α)100\%, the power of the rule to detect a change in means is simply given by:
Power = 1 - p \;\;\;\;\;\; (8)
where p is defined in Equation (6) above using the value of K that
corresponds to Δ/σ = 0. Thus, when the argument
delta.over.sigma=0
, the value of p is 1-α and the power is
simply α 100\%. As delta.over.sigma
increases above 0, the power
increases.
When pi.type="lower"
, the same value of K
is used as when
pi.type="upper"
, but Equation (4) is used to construct the prediction
interval. Thus, the power increases as delta.over.sigma
decreases below 0.
vector of values between 0 and 1 equal to the probability that at least one of k future observations or averages will fall outside the prediction interval.
See the help files for predIntNorm
and
predIntNormSimultaneous
.
In the course of designing a sampling program, an environmental scientist may wish
to determine the relationship between sample size, significance level, power, and
scaled difference if one of the objectives of the sampling program is to determine
whether two distributions differ from each other. The functions
predIntNormTestPower
and plotPredIntNormTestPowerCurve
can be
used to investigate these relationships for the case of normally-distributed
observations. In the case of a simple shift between the two means, the test based
on a prediction interval is not as powerful as the two-sample t-test. However, the
test based on a prediction interval is more efficient at detecting a shift in the
tail.
Steven P. Millard (EnvStats@ProbStatInfo.com)
See the help files for predIntNorm
and
predIntNormSimultaneous
.
# Show how the power increases as delta.over.sigma increases. # Assume a 95% upper prediction interval. predIntNormTestPower(n = 4, delta.over.sigma = 0:2) #[1] 0.0500000 0.1743014 0.3990892 #---------- # Look at how the power increases with sample size for a one-sided upper # prediction interval with k=3, delta.over.sigma=2, and a confidence level # of 95%. predIntNormTestPower(n = c(4, 8), k = 3, delta.over.sigma = 2) #[1] 0.3578250 0.5752113 #---------- # Show how the power for an upper 95% prediction limit increases as the # number of future observations k increases. Here, we'll use n=20 and # delta.over.sigma=1. predIntNormTestPower(n = 20, k = 1:3, delta.over.sigma = 1) #[1] 0.2408527 0.2751074 0.2936486
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.