Generalized Maximally Selected Statistics
Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.
## S3 method for class 'formula' maxstat_test(formula, data, subset = NULL, weights = NULL, ...) ## S3 method for class 'table' maxstat_test(object, ...) ## S3 method for class 'IndependenceProblem' maxstat_test(object, teststat = c("maximum", "quadratic"), distribution = c("asymptotic", "approximate", "none"), minprob = 0.1, maxprob = 1 - minprob, ...)
formula |
a formula of the form |
data |
an optional data frame containing the variables in the model formula. |
subset |
an optional vector specifying a subset of observations to be used. Defaults
to |
weights |
an optional formula of the form |
object |
an object inheriting from classes |
teststat |
a character, the type of test statistic to be applied: either a maximum
statistic ( |
distribution |
a character, the conditional null distribution of the test statistic can be
approximated by its asymptotic distribution ( |
minprob |
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only
greater than the |
maxprob |
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only
smaller than the |
... |
further arguments to be passed to |
maxstat_test
provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected chi^2 statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).
The null hypothesis of independence, or conditional independence given
block
, between y1
, ..., yq
and x1
, ...,
xp
is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate x1
, ...,
xp
, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.
If both response and covariate is univariable, say y1
and x1
,
this procedure is known as maximally selected chi^2 statistics
(Miller and Siegmund, 1982) when y1
is a binary factor and x1
is
a numeric variable, and as maximally selected rank statistics when y1
is a rank transformed numeric variable and x1
is a numeric variable
(Lausen and Schumacher, 1992). Lausen et al. (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
numeric covariates x1
, ..., xp
.
If, say, y1
and/or x1
are ordered factors, the default scores,
1:nlevels(y1)
and 1:nlevels(x1)
respectively, can be altered
using the scores
argument (see independence_test
); this
argument can also be used to coerce nominal factors to class "ordered"
.
If both, say, y1
and x1
are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the alternative
argument. The particular
extension to the case of a univariable ordered response and a univariable
numeric covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.
The conditional null distribution of the test statistic is used to obtain
p-values and an asymptotic approximation of the exact distribution is
used by default (distribution = "asymptotic"
). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
distribution
to "approximate"
. See asymptotic
and
approximate
for details.
An object inheriting from class "IndependenceTest"
.
Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using teststat = "maxtype"
and
teststat = "quadtype"
respectively (as was used in versions prior to
0.4-5).
Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320. doi: 10.1111/j.0006-341X.1999.00317.x
Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137. doi: 10.1016/S0167-9473(02)00225-6
Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269. doi: 10.1111/j.1541-0420.2008.00995.x
Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal 46(3), 364–374. doi: 10.1002/bimj.200310030
Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85. doi: 10.2307/2532740
Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016. doi: 10.2307/2529881
Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228. doi: 10.1007/s10342-004-0035-5
## Tree pipit data (Mueller and Hothorn, 2004) ## Asymptotic maximally selected statistics maxstat_test(counts ~ coverstorey, data = treepipit) ## Asymptotic maximally selected statistics ## Note: all covariates simultaneously mt <- maxstat_test(counts ~ ., data = treepipit) mt@estimates$estimate ## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2) ## Asymptotic maximally selected statistics maxstat_test(Surv(time, event) ~ EF, data = hohnloser, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3) ## Asymptotic maximally selected statistics data("sphase", package = "TH.data") maxstat_test(Surv(RFS, event) ~ SPF, data = sphase, ytrafo = function(data) trafo(data, surv_trafo = function(y) logrank_trafo(y, ties.method = "Hothorn-Lausen"))) ## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8) ## Asymptotic maximally selected statistics maxstat_test(jobsatisfaction) ## Asymptotic maximally selected statistics ## Note: 'Job.Satisfaction' and 'Income' as ordinal maxstat_test(jobsatisfaction, scores = list("Job.Satisfaction" = 1:4, "Income" = 1:4))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.