Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

varbin

Mean, Variance and Confidence Interval of a Proportion


Description

This function computes the mean and variance of a proportion from clustered binomial data (n, y), using various methods. Confidence intervals are computed using a normal approximation, which might be inappropriate when the proportion is close to 0 or 1.

Usage

varbin(n, y, data, alpha = 0.05, R = 5000)

Arguments

n

The denominator of the proportion.

y

The numerator of the proportion.

data

A data frame containing the data.

alpha

The significance level for the confidence intervals. Default to 0.05, providing 95% CI's.

R

The number of bootstrap replicates to compute the bootstrap mean and variance.

Details

Five methods are used for the estimations. Let us consider N clusters of sizes n_1, …, n_N with observed responses (counts) y_1, …, y_N. We note p_i = y_i / n_i the observed proportions (i = 1, …, N). An underlying assumption is that the theoretical proportion is homogeneous across the clusters.

Binomial method: the proportion and its variance are estimated as p = sum(y_i) / sum(n_i) and p * (1 - p) / sum(n_i - 1), respectively.

Ratio method: the one-stage cluster sampling formula is used to estimate the variance of the ratio estimate (see Cochran, 1999, p. 32 and p. 66). The proportion is estimated as above (p).

Arithmetic method: the proportion is estimated as p_A = sum(y_i / n_i) / N, with estimated variance [1/(N * (N - 1))] sum((p_i - p_A)^2).

Jackknife method: the proportion p_J is the arithmetic mean of the pseudovalues pv_i, with estimated variance [1/(N * (N - 1))]sum((pv_i - p_J)^2) (Gladen, 1977, Paul, 1982).

Bootstrap method: R samples of size N are drawn with equal probability from the initial sample (p_1, … , p_N) (Efron and Tibshirani, 1993). The bootstrap estimate p_B and its estimated variance are the arithmetic mean and the empirical variance (computed with denominator R - 1) of the R binomial estimates, respectively.

Value

An object of formal class “varbin”, with 5 slots:

CALL

The call of the function.

tab

A 4-column data frame giving for each estimation method the mean, variance, upper and lower limits of the (1 - α) confidence interval.

boot

A numeric vector containing the R bootstrap replicates of the proportion. Might be used to compute other kinds of CI's for the proportion.

alpha

The significance level used to compute the (1 - α) confidence intervals.

features

A numeric vector with 3 components summarizing the main features of the data: N = number of clusters, n = number of subjects, y = number of cases.

The “show” method displays the slot tab described above, substituting the standard error to the variance.

Author(s)

Matthieu Lesnoff matthieu.lesnoff@cirad.fr, Renaud Lancelot renaud.lancelot@cirad.fr

References

Cochran, W.G., 1999, 3th ed. Sampling techniques. Wiley, New York.
Efron, B., Tibshirani, R., 1993. An introduction to the bootstrap. Chapman and Hall, London.
Gladen, B., 1977. The use of the jackknife to estimate proportions from toxicological data in the presence of litter effects. JASA 74(366), 278-283.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments. Biometrics 38, 361-370.

See Also

Examples

data(rabbits)
  varbin(n, y, rabbits[rabbits$group == "M", ])
  by(rabbits,
     list(group = rabbits$group),
     function(x) varbin(n = n, y = y, data = x, R = 1000))

aod

Analysis of Overdispersed Data

v1.3.1
GPL (>= 2)
Authors
Matthieu Lesnoff <matthieu.lesnoff@cirad.fr> and Renaud Lancelot <renaud.lancelot@cirad.fr>
Initial release
2012-04-10

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.