Estimate Parameters of a Zero-Modified Lognormal (Delta) Distribution
Estimate the parameters of a zero-modified lognormal distribution or a zero-modified lognormal distribution (alternative parameterization), and optionally construct a confidence interval for the mean.
ezmlnorm(x, method = "mvue", ci = FALSE, ci.type = "two-sided", ci.method = "normal.approx", conf.level = 0.95) ezmlnormAlt(x, method = "mvue", ci = FALSE, ci.type = "two-sided", ci.method = "normal.approx", conf.level = 0.95)
x |
numeric vector of observations. Missing ( |
method |
character string specifying the method of estimation. The only possible value is
|
ci |
logical scalar indicating whether to compute a confidence interval for the
mean. The default value is |
ci.type |
character string indicating what kind of confidence interval to compute. The
possible values are |
ci.method |
character string indicating what method to use to construct the confidence
interval for the mean. The only possible value is |
conf.level |
a scalar between 0 and 1 indicating the confidence level of the confidence interval.
The default value is |
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, -Inf
) values, they will be removed prior to
performing the estimation.
Let \underline{x} = (x_1, x_2, …, x_n) be a vector of
n observations from a
zero-modified lognormal distribution with
parameters meanlog=
μ, sdlog=
σ, and
p.zero=
p. Alternatively, let
\underline{x} = (x_1, x_2, …, x_n) be a vector of
n observations from a
zero-modified lognormal distribution
(alternative parameterization) with parameters mean=
θ,
cv=
τ, and p.zero=
p.
Let r denote the number of observations in \underline{x} that are equal to 0, and order the observations so that x_1, x_2, …, x_r denote the r zero observations and x_{r+1}, x_{r+2}, …, x_n denote the n-r non-zero observations.
Note that θ is not the mean of the zero-modified lognormal distribution; it is the mean of the lognormal part of the distribution. Similarly, τ is not the coefficient of variation of the zero-modified lognormal distribution; it is the coefficient of variation of the lognormal part of the distribution.
Let γ, δ, and φ denote the mean, standard deviation, and coefficient of variation of the overall zero-modified lognormal (delta) distribution. Let η denote the standard deviation of the lognormal part of the distribution, so that η = θ τ. Aitchison (1955) shows that:
γ = (1 - p) θ \;\;\;\; (1)
δ^2 = (1 - p) η^2 + p (1 - p) θ^2 \;\;\;\; (2)
so that
φ = \frac{δ}{γ} = \frac{√{τ^2 + p}}{√{1-p}} \;\;\;\; (3)
Estimation
Minimum Variance Unbiased Estimation (method="mvue"
)
Aitchison (1955) shows that the minimum variance unbiased estimators (mvue's) of
γ and δ are:
\hat{γ}_{mvue} = | (1-\frac{r}{n}) e^{\bar{y}} g_{n-r-1}(\frac{s^2}{2}) | if r < n - 1, |
x_n / n | if r = n - 1, | |
0 | if r = n \;\;\;\; (4) | |
\hat{δ}^2_{mvue} = | (1-\frac{r}{n}) e^{2\bar{y}} \{g_{n-r-1}(2s^2) - \frac{n-r-1}{n-1} g_{n-r-1}[\frac{(n-r-2)s^2}{n-r-1}] \} | if r < n - 1, |
x_n^2 / n | if r = n - 1, | |
0 | if r = n \;\;\;\; (5) |
where
y_i = log(x_i), \; r = r+1, r+2, …, n \;\;\;\; (6)
\bar{y} = \frac{1}{n-r} ∑_{i=r+1}^n y_i \;\;\;\; (7)
s^2 = \frac{1}{n-r-1} ∑_{i=r+1}^n (y_i - \bar{y})^2 \;\;\;\; (8)
g_m(z) = ∑_{i=0}^∞ \frac{m^i (m+2i)}{m(m+2) \cdots (m+2i)} (\frac{m}{m+1})^i (\frac{z^i}{i!}) \;\;\;\; (9)
Note that when r=n-1 or r=n, the estimator of γ is simply the sample mean for all observations (including zero values), and the estimator for δ^2 is simply the sample variance for all observations.
The expected value and asymptotic variance of the mvue of γ are (Aitchison and Brown, 1957, p.99; Owen and DeRouen, 1980):
E(\hat{γ}_{mvue}) = γ \;\;\;\; (10)
AVar(\hat{γ}_{mvue}) = \frac{1}{n} exp(2μ + σ^2) (1-p) (p + \frac{2σ^2 + σ^4}{2}) \;\;\;\; (11)
Confidence Intervals
Based on Normal Approximation (ci.method="normal.approx"
)
An approximate (1-α)100\% confidence interval for γ is
constructed based on the assumption that the estimator of γ is
approximately normally distributed. Thus, an approximate two-sided
(1-α)100\% confidence interval for γ is constructed as:
[ \hat{γ}_{mvue} - t_{n-2, 1-α/2} \hat{σ}_{\hat{γ}}, \; \hat{γ}_{mvue} + t_{n-2, 1-α/2} \hat{σ}_{\hat{γ}} ] \;\;\;\; (12)
where t_{ν, p} is the p'th quantile of Student's t-distribution with ν degrees of freedom, and the quantity \hat{σ}_{\hat{γ}} is the estimated standard deviation of the mvue of γ, and is computed by replacing the values of μ, σ, and p in equation (11) above with their estimated values and taking the square root.
Note that there must be at least 3 non-missing observations (n ≥ 3) and at least one observation must be non-zero (r ≤ n-1) in order to construct a confidence interval.
One-sided confidence intervals are computed in a similar fashion.
a list of class "estimate"
containing the estimated parameters and other information.
See estimate.object
for details.
For the function ezmlnorm
, the component called parameters
is a
numeric vector with the following estimated parameters:
Parameter Name | Explanation |
meanlog |
mean of the log of the lognormal part of the distribution. |
sdlog |
standard deviation of the log of the lognormal part of the distribution. |
p.zero |
probability that an observation will be 0. |
mean.zmlnorm |
mean of the overall zero-modified lognormal (delta) distribution. |
sd.zmlnorm |
standard deviation of the overall zero-modified lognormal (delta) distribution. |
For the function ezmlnormAlt
, the component called parameters
is a
numeric vector with the following estimated parameters:
Parameter Name | Explanation |
mean |
mean of the lognormal part of the distribution. |
cv |
coefficient of variation of the lognormal part of the distribution. |
p.zero |
probability that an observation will be 0. |
mean.zmlnorm |
mean of the overall zero-modified lognormal (delta) distribution. |
sd.zmlnorm |
standard deviation of the overall zero-modified lognormal (delta) distribution. |
The zero-modified lognormal (delta) distribution is sometimes used to model chemical concentrations for which some observations are reported as “Below Detection Limit” (the nondetects are assumed equal to 0). See, for example, Gilliom and Helsel (1986), Owen and DeRouen (1980), and Gibbons et al. (2009, Chapter 12). USEPA (2009, Chapter 15) recommends this strategy only in specific situations, and Helsel (2012, Chapter 1) strongly discourages this approach to dealing with non-detects.
A variation of the zero-modified lognormal (delta) distribution is the zero-modified normal distribution, in which a normal distribution is mixed with a positive probability mass at 0.
One way to try to assess whether a zero-modified lognormal (delta),
zero-modified normal, censored normal, or censored lognormal is the best
model for the data is to construct both censored and detects-only probability
plots (see qqPlotCensored
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Aitchison, J. (1955). On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 50, 901–908.
Aitchison, J., and J.A.C. Brown (1957). The Lognormal Distribution (with special reference to its uses in economics). Cambridge University Press, London. pp.94-99.
Crow, E.L., and K. Shimizu. (1988). Lognormal Distributions: Theory and Applications. Marcel Dekker, New York, pp.47–51.
Gibbons, RD., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring. Second Edition. John Wiley and Sons, Hoboken, NJ.
Gilliom, R.J., and D.R. Helsel. (1986). Estimation of Distributional Parameters for Censored Trace Level Water Quality Data: 1. Estimation Techniques. Water Resources Research 22, 135–146.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R. Second Edition. John Wiley and Sons, Hoboken, NJ, Chapter 1.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, p.312.
Owen, W., and T. DeRouen. (1980). Estimation of the Mean for Lognormal Data Containing Zeros and Left-Censored Values, with Applications to the Measurement of Worker Exposure to Air Contaminants. Biometrics 36, 707–719.
USEPA (1992c). Statistical Analysis of Ground-Water Monitoring Data at RCRA Facilities: Addendum to Interim Final Guidance. Office of Solid Waste, Permits and State Programs Division, US Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.
# Generate 100 observations from a zero-modified lognormal (delta) # distribution with mean=2, cv=1, and p.zero=0.5, then estimate the # parameters. According to equations (1) and (3) above, the overall mean # is mean.zmlnorm=1 and the overall cv is cv.zmlnorm=sqrt(3). # (Note: the call to set.seed simply allows you to reproduce this example.) set.seed(250) dat <- rzmlnormAlt(100, mean = 2, cv = 1, p.zero = 0.5) ezmlnormAlt(dat, ci = TRUE) #Results of Distribution Parameter Estimation #-------------------------------------------- # #Assumed Distribution: Zero-Modified Lognormal (Delta) # #Estimated Parameter(s): mean = 1.9604561 # cv = 0.9169411 # p.zero = 0.4500000 # mean.zmlnorm = 1.0782508 # cv.zmlnorm = 1.5307175 # #Estimation Method: mvue # #Data: dat # #Sample Size: 100 # #Confidence Interval for: mean.zmlnorm # #Confidence Interval Method: Normal Approximation # (t Distribution) # #Confidence Interval Type: two-sided # #Confidence Level: 95% # #Confidence Interval: LCL = 0.748134 # UCL = 1.408368 #---------- # Clean up rm(dat)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.