Two-Sample Rank Test to Detect a Shift in a Proportion of the "Treated" Population
Two-sample rank test to detect a positive shift in a proportion of one population (here called the “treated” population) compared to another (here called the “reference” population). This test is usually called the quantile test (Johnson et al., 1987).
quantileTest(x, y, alternative = "greater", target.quantile = 0.5, target.r = NULL, exact.p = TRUE)
x |
numeric vector of observations from the “treatment” group.
Missing ( |
y |
numeric vector of observations from the “reference” group.
Missing ( |
alternative |
character string indicating the kind of alternative hypothesis. The possible values
are |
target.quantile |
numeric scalar between 0 and 1 indicating the desired quantile to use as the
lower cut off point for the test. Because of the discrete nature of empirical
quantiles, the upper bound for the possible empirical quantiles will often differ
from the value of |
target.r |
integer indicating the rank of the observation to use as the lower cut off point
for the test. The value of |
exact.p |
logical scalar indicating whether to compute the p-value based on the exact
distribution of the test statistic ( |
Let X denote a random variable representing measurements from a “treatment” group with cumulative distribution function (cdf)
F_X(t) = Pr(X ≤ t) \;\;\;\;\;\; (1)
and let x_1, x_2, …, x_m denote m observations from this treatment group. Let Y denote a random variable from a “reference” group with cdf
F_Y(t) = Pr(Y ≤ t) \;\;\;\;\;\; (2)
and let y_1, y_2, …, y_n denote n observations from this reference group. Consider the null hypothesis:
H_0: F_X(t) = F_Y(t), \;\; -∞ < t < ∞ \;\;\;\;\;\; (3)
versus the alternative hypothesis
H_a: F_X(t) = (1 - ε) F_Y(t) + ε F_Z(t) \;\;\;\;\;\; (4)
where Z denotes some random variable with cdf
F_Z(t) = Pr(Z ≤ t) \;\;\;\;\;\; (5)
and 0 < ε ≤ 1, F_Z(t) ≤ F_Y(t) for all values of t, and F_Z(t) \ne F_Y(t) for at least one value of t.
In English, the alternative hypothesis (4) says that a portion ε of the distribution for the treatment group (the distribution of X) is shifted to the right of the distribution for the reference group (the distribution of Y). The alternative hypothesis (4) with ε = 1 is the alternative hypothesis associated with testing a location shift, for which the the Wilcoxon rank sum test can be used.
Johnson et al. (1987) investigated locally most powerful rank tests for the test of the null hypothesis (3) against the alternative hypothesis (4). They considered the case when Y and Z were normal random variables and the case when the densities of Y and Z assumed only two positive values. For the latter case, the locally most powerful rank test reduces to the following procedure, which Johnson et al. (1987) call the quantile test.
Combine the n observations from the reference group and the m observations from the treatment group and rank them from smallest to largest. Tied observations receive the average rank of all observations tied at that value.
Choose a quantile q and determine the smallest rank r such that
\frac{r}{m+n+1} > q \;\;\;\;\;\; (6)
Note that because of the discrete nature of ranks, any quantile q' such that
\frac{r}{m+n+1} > q' ≥ \frac{r-1}{m+n+1} \;\;\;\;\;\; (7)
will yield the same value for r as the quantile q does.
Alternatively, choose a value of r. The bounds on an associated quantile
are then given in Equation (7). Note: the component called parameters
in
the list returned by quantileTest
contains an element named
quantile.ub
. The value of this element is the left-hand side of Equation (7).
Set k equal to the number of observations from the treatment group (the number of X observations) with ranks bigger than or equal to r.
Under the null hypothesis (3), the probability that at least k out of the r largest observations come from the treatment group is given by:
p = ∑_{i=k}^r \frac{{m+n-r \choose m-i} {r \choose i}}{{m+n \choose n}} \;\;\;\;\;\; (8)
This probability may be approximated by:
p = 1 - Φ(\frac{k - μ_k - 1/2}{σ_k}) \;\;\;\;\;\; (9)
where
μ_k = \frac{mr}{m+n} \;\;\;\;\;\; (10)
σ_k^2 = \frac{mnr(m+n-r)}{(m+n)^2 (m+n-1)} \;\;\;\;\;\; (11)
and Φ denotes the cumulative distribution function of the standard
normal distribution (USEPA, 1994, pp.7.16-7.17).
(See quantileTestPValue
.)
Reject the null hypothesis (3) in favor of the alternative hypothesis (4) at significance level α if p ≤ α.
Johnson et al. (1987) note that their quantile test is asymptotically equivalent to one proposed by Carrano and Moore (1982) in the context of a two-sided test. Also, when q=0.5, the quantile test reduces to Mood's median test for two groups (see Zar, 2010, p.172; Conover, 1980, pp.171-178).
The optimal choice of q or r in Step 2 above (i.e., the choice that yields the largest power) depends on the true underlying distributions of Y and Z and the mixing proportion ε. Johnson et al. (1987) performed a simulation study and showed that the quantile test performs better than the Wilcoxon rank sum test and the normal scores test under the alternative of a mixed normal distribution with a shift of at least 2 standard deviations in the Z distribution. USEPA (1994, pp.7.17-7.21) shows that when the mixing proportion ε is small and the shift is large, the quantile test is more powerful than the Wilcoxon rank sum test, and when ε is large and the shift is small the Wilcoxon rank sum test is more powerful than the quantile test.
A list of class "htest"
containing the results of the hypothesis test.
See the help file for htest.object
for details.
The EPA guidance document Statistical Methods for Evaluating the Attainment of Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid Media (USEPA, 1994, pp.4.7-4.9) recommends three different statistical tests for determining whether a remediated Superfund site has attained compliance: the Wilcoxon rank sum test, the quantile test, and the “hot measurement” comparison test. The Wilcoxon rank sum test and quantile test are nonparametric tests that compare chemical concentrations in the cleanup area with those in the reference area. The hot-measurement comparison test compares concentrations in the cleanup area with a pre-specified upper limit value Hm (the value of Hm must be negotiated between the EPA and the Superfund-site owner or operator). The Wilcoxon rank sum test is appropriate for detecting uniform failure of remedial action throughout the cleanup area. The quantile test is appropriate for detecting failure in only a few areas within the cleanup area. The hot-measurement comparison test is appropriate for detecting hot spots that need to be remediated regardless of the outcomes of the other two tests.
USEPA (1994, pp.4.7-4.9) recommends applying all three tests to all cleanup units within a cleanup area. This leads to the usual multiple comparisons problem: the probability of at least one of the tests indicating non-compliance, when in fact the cleanup area is in compliance, is greater than the pre-set Type I error level for any of the individual tests. USEPA (1994, p.3.3) recommends against using multiple comparison procedures to control the overall Type I error and suggests instead a re-sampling scheme where additional samples are taken in cases where non-compliance is indicated.
Steven P. Millard (EnvStats@ProbStatInfo.com)
Carrano, A., and D. Moore. (1982). The Rationale and Methodology for Quantifying Sister Chromatid Exchange in Humans. In Heddle, J.A., ed., Mutagenicity: New Horizons in Genetic Toxocology. Academic Press, New York, pp.268-304.
Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, Chapter 4.
Johnson, R.A., S. Verrill, and D.H. Moore. (1987). Two-Sample Rank Tests for Detecting Changes That Occur in a Small Proportion of the Treated Population. Biometrics 43, 641-655.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, FL, pp.435-439.
USEPA. (1994). Statistical Methods for Evaluating the Attainment of Cleanup Standards, Volume 3: Reference-Based Standards for Soils and Solid Media. EPA/230-R-94-004. Office of Policy, Planning, and Evaluation, U.S. Environmental Protection Agency, Washington, D.C.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. Prentice-Hall, Upper Saddle River, NJ.
# Following Example 7.5 on pages 7.23-7.24 of USEPA (1994b), perform the # quantile test for the TcCB data (the data are stored in EPA.94b.tccb.df). # There are n=47 observations from the reference area and m=77 observations # from the cleanup unit. The target rank is set to 9, resulting in a value # of quantile.ub=0.928. Note that the p-value is 0.0114, not 0.0117. EPA.94b.tccb.df # TcCB.orig TcCB Censored Area #1 0.22 0.22 FALSE Reference #2 0.23 0.23 FALSE Reference #... #46 1.20 1.20 FALSE Reference #47 1.33 1.33 FALSE Reference #48 <0.09 0.09 TRUE Cleanup #49 0.09 0.09 FALSE Cleanup #... #123 51.97 51.97 FALSE Cleanup #124 168.64 168.64 FALSE Cleanup # Determine the values to use for r and k for # a desired significance level of 0.01 #-------------------------------------------- p.vals <- quantileTestPValue(m = 77, n = 47, r = c(rep(8, 3), rep(9, 3), rep(10, 3)), k = c(6, 7, 8, 7, 8, 9, 8, 9, 10)) round(p.vals, 3) #[1] 0.355 0.122 0.019 0.264 0.081 0.011 0.193 0.053 0.007 # Choose r=9, k=9 to get a significance level of 0.011 #----------------------------------------------------- with(EPA.94b.tccb.df, quantileTest(TcCB[Area=="Cleanup"], TcCB[Area=="Reference"], target.r = 9)) #Results of Hypothesis Test #-------------------------- # #Null Hypothesis: e = 0 # #Alternative Hypothesis: Tail of Fx Shifted to Right of # Tail of Fy. # 0 < e <= 1, where # Fx(t) = (1-e)*Fy(t) + e*Fz(t), # Fz(t) <= Fy(t) for all t, # and Fy != Fz # #Test Name: Quantile Test # #Data: x = TcCB[Area == "Cleanup"] # y = TcCB[Area == "Reference"] # #Sample Sizes: nx = 77 # ny = 47 # #Test Statistics: k (# x obs of r largest) = 9 # r = 9 # #Test Statistic Parameters: m = 77.000 # n = 47.000 # quantile.ub = 0.928 # #P-value: 0.01136926 #========== # Clean up #--------- rm(p.vals)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.