Estimate Parameter of a Binomial Distribution
Estimate p (the probability of “success”) for a binomial distribution, and optionally construct a confidence interval for p.
ebinom(x, size = NULL, method = "mle/mme/mvue", ci = FALSE, ci.type = "two-sided", ci.method = "score", correct = TRUE, var.denom = "n", conf.level = 0.95, warn = TRUE)
x |
numeric or logical vector of observations. When |
size |
positive integer indicating the of number of trials; |
method |
character string specifying the method of estimation. The only possible value is
|
ci |
logical scalar indicating whether to compute a confidence interval for the mean. The default value
is |
ci.type |
character string indicating what kind of confidence interval to compute. The possible values are
|
ci.method |
character string indicating which method to use to construct the confidence interval. Possible values
are |
correct |
logical scalar indicating whether to use the continuity correction when |
var.denom |
character string indicating what value to use in the denominator of the variance estimator when
|
conf.level |
a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default
value is |
warn |
a logical scalar indicating whether to issue a waning in the case when |
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to performing the estimation.
If \underline{x} is a vector of n observations from a binomial distribution with
parameters size=
1 and prob=
p, then the sum of all the values in
\underline{x} is an observation from a binomial distribution with parameters
size=
n and prob=
p.
If x is an observation from a binomial distribution with parameters size=
n
and prob=
p, the maximum likelihood estimator (mle), method of moments estimator (mme),
and minimum variance unbiased estimator (mvue) of p is simply x/n.
Confidence Intervals.
ci.method="score"
The confidence interval for p based on the
score method was developed by Wilson (1927) and is discussed by Newcombe (1998a),
Agresti and Coull (1998), and Agresti and Caffo (2000). When ci=TRUE
and
ci.method="score"
, the function ebinom
calls the R function
prop.test
to compute the confidence interval. This method
has been shown to provide the best performance (in terms of actual coverage matching assumed
coverage) of all the methods provided here, although unlike the exact method, the actual
coverage can fall below the assumed coverage.
ci.method="exact"
The confidence interval for p based on the
exact (Clopper-Pearson) method is discussed by Newcombe (1998a), Agresti and Coull (1998),
and Zar (2010, pp.543-547). This is the method used in the R function
binom.test
. This method ensures the actual coverage is greater than or
equal to the assumed coverage.
ci.method="Wald"
The confidence interval for p based on the Wald method (with or without a correction for continuity) is the usual “normal approximation” method and is discussed by Newcombe (1998a), Agresti and Coull (1998), Agresti and Caffo (2000), and Zar (2010, pp.543-547). This method is never recommended but is included for historical purposes.
ci.method="adjusted Wald"
The confidence interval for p based on the adjusted Wald method is discussed by Agresti and Coull (1998), Agresti and Caffo (2000), and Zar (2010, pp.543-547). This is a simple modification of the Wald method and performs surpringly well.
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
The binomial distribution is used to model processes with binary (Yes-No, Success-Failure, Heads-Tails, etc.) outcomes. It is assumed that the outcome of any one trial is independent of any other trial, and that the probability of “success”, p, is the same on each trial. A binomial discrete random variable X is the number of “successes” in n independent trials. A special case of the binomial distribution occurs when n=1, in which case X is also called a Bernoulli random variable.
In the context of environmental statistics, the binomial distribution is sometimes used to model the proportion of times a chemical concentration exceeds a set standard in a given period of time (e.g., Gilbert, 1987, p.143). The binomial distribution is also used to compute an upper bound on the overall Type I error rate for deciding whether a facility or location is in compliance with some set standard. Assume the null hypothesis is that the facility is in compliance. If a test of hypothesis is conducted periodically over time to test compliance and/or several tests are performed during each time period, and the facility or location is always in compliance, and each single test has a Type I error rate of α, and the result of each test is independent of the result of any other test (usually not a reasonable assumption), then the number of times the facility is declared out of compliance when in fact it is in compliance is a binomial random variable with probability of “success” p=α being the probability of being declared out of compliance (see USEPA, 2009).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Agresti, A., and B.A. Coull. (1998). Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions. The American Statistician, 52(2), 119–126.
Agresti, A., and B. Caffo. (2000). Simple and Effective Confidence Intervals for Proportions and Differences of Proportions Result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288.
Berthouex, P.M., and L.C. Brown. (1994). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton, FL, Chapters 2 and 15.
Cochran, W.G. (1977). Sampling Techniques. John Wiley and Sons, New York, Chapter 3.
Fisher, R.A., and F. Yates. (1963). Statistical Tables for Biological, Agricultural, and Medical Research. 6th edition. Hafner, New York, 146pp.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. Second Edition. John Wiley and Sons, New York, Chapters 1-2.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 11.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 3.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.
Newcombe, R.G. (1998a). Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, 857–872.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL, Chapter 4.
USEPA. (1989b). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities, Interim Final Guidance. EPA/530-SW-89-026. Office of Solid Waste, U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.6-38.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ, Chapter 24.
# Generate 20 observations from a binomial distribution with # parameters size=1 and prob=0.2, then estimate the 'prob' parameter. # (Note: the call to set.seed simply allows you to reproduce this # example. Also, the only parameter estimated is 'prob'; 'size' is # specified in the call to ebinom. The parameter 'size' is printed # inorder to show all of the parameters associated with the # distribution.) set.seed(251) dat <- rbinom(20, size = 1, prob = 0.2) ebinom(dat) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: Binomial # #Estimated Parameter(s): size = 20.0 # prob = 0.1 # #Estimation Method: mle/mme/mvue for 'prob' # #Data: dat # #Sample Size: 20 #---------------------------------------------------------------- # Generate one observation from a binomial distribution with # parameters size=20 and prob=0.2, then estimate the "prob" # parameter and compute a confidence interval: set.seed(763) dat <- rbinom(1, size=20, prob=0.2) ebinom(dat, size = 20, ci = TRUE) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: Binomial # #Estimated Parameter(s): size = 20.00 # prob = 0.35 # #Estimation Method: mle/mme/mvue for 'prob' # #Data: dat # #Sample Size: 20 # #Confidence Interval for: prob # #Confidence Interval Method: Score normal approximation # (With continuity correction) # #Confidence Interval Type: two-sided # #Confidence Level: 95% # #Confidence Interval: LCL = 0.1630867 # UCL = 0.5905104 #---------------------------------------------------------------- # Using the data from the last example, compare confidence # intervals based on the various methods ebinom(dat, size = 20, ci = TRUE, ci.method = "score", correct = TRUE)$interval$limits # LCL UCL #0.1630867 0.5905104 ebinom(dat, size = 20, ci = TRUE, ci.method = "score", correct = FALSE)$interval$limits # LCL UCL #0.1811918 0.5671457 ebinom(dat, size = 20, ci = TRUE, ci.method = "exact")$interval$limits # LCL UCL #0.1539092 0.5921885 ebinom(dat, size = 20, ci = TRUE, ci.method = "adjusted Wald")$interval$limits # LCL UCL #0.1799264 0.5684112 ebinom(dat, size = 20, ci = TRUE, ci.method = "Wald", correct = TRUE)$interval$limits # LCL UCL #0.1159627 0.5840373 ebinom(dat, size = 20, ci = TRUE, ci.method = "Wald", correct = FALSE)$interval$limits # LCL UCL #0.1409627 0.5590373 #---------------------------------------------------------------- # Use the cadmium data on page 8-6 of USEPA (1989b) to compute # two-sided 95% confidence intervals for the probability of # detection at background and compliance wells. The data are # stored in EPA.89b.cadmium.df. EPA.89b.cadmium.df # Cadmium.orig Cadmium Censored Well.type #1 0.1 0.100 FALSE Background #2 0.12 0.120 FALSE Background #3 BDL 0.000 TRUE Background #... #86 BDL 0.000 TRUE Compliance #87 BDL 0.000 TRUE Compliance #88 BDL 0.000 TRUE Compliance attach(EPA.89b.cadmium.df) # Probability of detection at Background well: #-------------------------------------------- ebinom(!Censored[Well.type=="Background"], ci=TRUE) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: Binomial # #Estimated Parameter(s): size = 24.0000000 # prob = 0.3333333 # #Estimation Method: mle/mme/mvue for 'prob' # #Data: !Censored[Well.type == "Background"] # #Sample Size: 24 # #Confidence Interval for: prob # #Confidence Interval Method: Score normal approximation # (With continuity correction) # #Confidence Interval Type: two-sided # #Confidence Level: 95% # #Confidence Interval: LCL = 0.1642654 # UCL = 0.5530745 # Probability of detection at Compliance well: #-------------------------------------------- ebinom(!Censored[Well.type=="Compliance"], ci=TRUE) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: Binomial # #Estimated Parameter(s): size = 64.000 # prob = 0.375 # #Estimation Method: mle/mme/mvue for 'prob' # #Data: !Censored[Well.type == "Compliance"] # #Sample Size: 64 # #Confidence Interval for: prob # #Confidence Interval Method: Score normal approximation # (With continuity correction) # #Confidence Interval Type: two-sided # #Confidence Level: 95% # #Confidence Interval: LCL = 0.2597567 # UCL = 0.5053034 #---------------------------------------------------------------- # Clean up rm(dat) detach("EPA.89b.cadmium.df")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.