Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

predIntNormN

Sample Size for a Specified Half-Width of a Prediction Interval for the next k Observations from a Normal Distribution


Description

Compute the sample size necessary to achieve a specified half-width of a prediction interval for the next k observations from a normal distribution.

Usage

predIntNormN(half.width, n.mean = 1, k = 1, sigma.hat = 1, 
    method = "Bonferroni", conf.level = 0.95, round.up = TRUE, 
    n.max = 5000, tol = 1e-07, maxiter = 1000)

Arguments

half.width

numeric vector of (positive) half-widths. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.

n.mean

numeric vector of positive integers specifying the sample size associated with the k future averages. The default value is n.mean=1 (i.e., individual observations). Note that all future averages must be based on the same sample size.

k

numeric vector of positive integers specifying the number of future observations or averages the prediction interval should contain with confidence level conf.level. The default value is k=1.

sigma.hat

numeric vector specifying the value(s) of the estimated standard deviation(s). The default value is sigma.hat=1.

method

character string specifying the method to use if the number of future observations (k) is greater than 1. The possible values are method="Bonferroni" (approximate method based on Bonferonni inequality; the default), and
method="exact" (exact method due to Dunnett, 1955). This argument is ignored if k=1.

conf.level

numeric vector of values between 0 and 1 indicating the confidence level of the prediction interval. The default value is conf.level=0.95.

round.up

logical scalar indicating whether to round up the values of the computed sample size(s) to the next smallest integer. The default value is round.up=TRUE.

n.max

positive integer greater than 1 indicating the maximum possible sample size. The default value is n.max=5000.

tol

numeric scalar indicating the tolerance to use in the uniroot search algorithm. The default value is tol=1e-7.

maxiter

positive integer indicating the maximum number of iterations to use in the uniroot search algorithm. The default value is maxiter=1000.

Details

If the arguments half.width, k, n.mean, sigma.hat, and conf.level are not all the same length, they are replicated to be the same length as the length of the longest argument.

The help files for predIntNorm and predIntNormK give formulas for a two-sided prediction interval based on the sample size, the observed sample mean and sample standard deviation, and specified confidence level. Specifically, the two-sided prediction interval is given by:

[\bar{x} - Ks, \bar{x} + Ks] \;\;\;\;\;\; (1)

where \bar{x} denotes the sample mean:

\bar{x} = \frac{1}{n} ∑_{i=1}^n x_i \;\;\;\;\;\; (2)

s denotes the sample standard deviation:

s^2 = \frac{1}{n-1} ∑_{i=1}^n (x_i - \bar{x})^2 \;\;\;\;\;\; (3)

and K denotes a constant that depends on the sample size n, the confidence level, the number of future averages k, and the sample size associated with the future averages, m (see the help file for predIntNormK). Thus, the half-width of the prediction interval is given by:

HW = Ks \;\;\;\;\;\; (4)

The function predIntNormN uses the uniroot search algorithm to determine the sample size for specified values of the half-width, number of observations used to create a single future average, number of future observations or averages, the sample standard deviation, and the confidence level. Note that unlike a confidence interval, the half-width of a prediction interval does not approach 0 as the sample size increases.

Value

numeric vector of sample sizes.

Note

See the help file for predIntNorm.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

See the help file for predIntNorm.

See Also

Examples

# Look at how the required sample size for a prediction interval increases 
  # with increasing number of future observations:

  1:5 
  #[1] 1 2 3 4 5 

  predIntNormN(half.width = 3, k = 1:5) 
  #[1]  6  9 11 14 18

  #----------

  # Look at how the required sample size for a prediction interval decreases 
  # with increasing half-width:

  2:5 
  #[1] 2 3 4 5 

  predIntNormN(half.width = 2:5) 
  #[1] 86  6  4  3

  predIntNormN(2:5, round = FALSE) 
  #[1] 85.567387  5.122911  3.542393  2.987861

  #----------

  # Look at how the required sample size for a prediction interval increases 
  # with increasing estimated standard deviation for a fixed half-width:

  seq(0.5, 2, by = 0.5) 
  #[1] 0.5 1.0 1.5 2.0 

  predIntNormN(half.width = 4, sigma.hat = seq(0.5, 2, by = 0.5)) 
  #[1]  3  4  7 86

  #----------

  # Look at how the required sample size for a prediction interval increases 
  # with increasing confidence level for a fixed half-width:

  seq(0.5, 0.9, by = 0.1) 
  #[1] 0.5 0.6 0.7 0.8 0.9 

  predIntNormN(half.width = 2, conf.level = seq(0.5, 0.9, by = 0.1)) 
  #[1] 2 2 3 4 9

  #==========

  # The data frame EPA.92c.arsenic3.df contains arsenic concentrations (ppb) 
  # collected quarterly for 3 years at a background well and quarterly for 
  # 2 years at a compliance well.  Using the data from the background well, 
  # compute the required sample size in order to achieve a half-width of 
  # 2.25, 2.5, or 3 times the estimated standard deviation for a two-sided 
  # 90% prediction interval for k=4 future observations.
  #
  # For a half-width of 2.25 standard deviations, the required sample size is 526, 
  # or about 131 years of quarterly observations!  For a half-width of 2.5 
  # standard deviations, the required sample size is 20, or about 5 years of 
  # quarterly observations.  For a half-width of 3 standard deviations, the required 
  # sample size is 9, or about 2 years of quarterly observations.

  EPA.92c.arsenic3.df
  #   Arsenic Year  Well.type
  #1     12.6    1 Background
  #2     30.8    1 Background
  #3     52.0    1 Background
  #...
  #18     3.8    5 Compliance
  #19     2.6    5 Compliance
  #20    51.9    5 Compliance

  mu.hat <- with(EPA.92c.arsenic3.df, 
    mean(Arsenic[Well.type=="Background"])) 

  mu.hat 
  #[1] 27.51667 

  sigma.hat <- with(EPA.92c.arsenic3.df, 
    sd(Arsenic[Well.type=="Background"]))

  sigma.hat 
  #[1] 17.10119 

  predIntNormN(half.width=c(2.25, 2.5, 3) * sigma.hat, k = 4, 
    sigma.hat = sigma.hat, conf.level = 0.9) 
  #[1] 526  20   9 

  #==========

  # Clean up
  #---------
  rm(mu.hat, sigma.hat)

EnvStats

Package for Environmental Statistics, Including US EPA Guidance

v2.4.0
GPL (>= 3)
Authors
Steven P. Millard [aut], Alexander Kowarik [ctb, cre] (<https://orcid.org/0000-0001-8598-4130>)
Initial release
2020-10-20

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.