Test Independence of Continuous Random Variables via Empirical Copula
Multivariate independence test based on the empirical copula process as proposed by Christian Genest and Bruno Rémillard. The test can be seen as composed of three steps: (i) a simulation step, which consists of simulating the distribution of the test statistics under independence for the sample size under consideration; (ii) the test itself, which consists of computing the approximate p-values of the test statistics with respect to the empirical distributions obtained in step (i); and (iii) the display of a graphic, called a dependogram, enabling to understand the type of departure from independence, if any. More details can be found in the articles cited in the reference section.
indepTestSim(n, p, m = p, N = 1000, verbose = interactive()) indepTest(x, d, alpha=0.05) dependogram(test, pvalues = FALSE, print = FALSE)
n |
sample size when simulating the distribution of the test statistics under independence. |
p |
dimension of the data when simulating the distribution of the test statistics under independence. |
m |
maximum cardinality of the subsets of variables for which a
test statistic is to be computed. It makes sense to consider m << p especially when |
N |
number of repetitions when simulating under independence. |
verbose |
a logical specifying if progress
should be displayed via |
x |
data frame or data matrix containing realizations (one per line) of the random vector whose independence is to be tested. |
d |
object of class |
alpha |
significance level used in the computation of the critical values for the test statistics. |
test |
object of class |
pvalues |
logical indicating whether the dependogram should be drew from test statistics or the corresponding p-values. |
print |
logical indicating whether details should be printed. |
The current (C code) implementation of indepTestSim()
uses (RAM) memory of size O(n^2 * p), and time
O(N * n^2 * p). This renders it unfeasible when
n
is large.
See the references below for more details, especially Genest and Rémillard (2004).
The former argument print.every
is deprecated and not
supported anymore; use verbose
instead.
The function indepTestSim()
returns an object of class
"indepTestDist"
whose attributes are: sample.size
,
data.dimension
, max.card.subsets
,
number.repetitons
, subsets
(list of the subsets for
which test statistics have been computed), subsets.binary
(subsets in binary 'integer' notation),
dist.statistics.independence
(a N
line matrix containing
the values of the test statistics for each subset and each repetition)
and dist.global.statistic.independence
(a vector a length
N
containing the values of the global Cramér-von Mises test
statistic for each repetition – see Genest et al (2007), p.175).
The function indepTest()
returns an object of class
"indepTest"
whose attributes are: subsets
,
statistics
, critical.values
, pvalues
,
fisher.pvalue
(a p-value resulting from a combination à la
Fisher of the subset statistic p-values), tippett.pvalue
(a p-value
resulting from a combination à la Tippett of the subset
statistic p-values),
alpha
(global significance level of the test), beta
(1 - beta
is the significance level per statistic),
global.statistic
(value of the global Cramér-von Mises
statistic derived directly from the independence empirical copula
process - see Genest et al (2007), p.175) and
global.statistic.pvalue
(corresponding p-value).
Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés: un test non paramétrique d'indépendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274–292.
Deheuvels, P. (1981) A non parametric test for independence, Publ. Inst. Statist. Univ. Paris. 26, 29–50.
Genest, C. and Rémillard, B. (2004) Tests of independence and randomness based on the empirical copula process. Test 13, 335–369.
Genest, C., Quessy, J.-F., and Rémillard, B. (2006). Local efficiency of a Cramer-von Mises test of independence, Journal of Multivariate Analysis 97, 274–294.
Genest, C., Quessy, J.-F., and Rémillard, B. (2007) Asymptotic local efficiency of Cramér-von Mises tests for multivariate independence. The Annals of Statistics 35, 166–191.
## Consider the following example taken from ## Genest and Remillard (2004), p 352: set.seed(2004) x <- matrix(rnorm(500),100,5) x[,1] <- abs(x[,1]) * sign(x[,2] * x[,3]) x[,5] <- x[,4]/2 + sqrt(3) * x[,5]/2 ## In order to test for independence "within" x, the first step consists ## in simulating the distribution of the test statistics under ## independence for the same sample size and dimension, ## i.e. n=100 and p=5. As we are going to consider all the subsets of ## {1,...,5} whose cardinality is between 2 and 5, we set p=m=5. ## For a realistic N = 1000 (default), this takes a few seconds: N. <- if(copula:::doExtras()) 1000 else 120 N. system.time(d <- indepTestSim(100, 5, N = N.)) ## For N=1000, 2 seconds (lynne 2015) ## You could save 'd' for future use, via saveRDS() ## The next step consists of performing the test itself (and print its results): (iTst <- indepTest(x,d)) ## Display the dependogram with the details: dependogram(iTst, print=TRUE) ## We could have tested for a weaker form of independence, for instance, ## by only computing statistics for subsets whose cardinality is between 2 ## and 3. Consider for instance the following data: y <- matrix(runif(500),100,5) ## and perform the test: system.time( d <- indepTestSim(100,5,3, N=N.) ) iTy <- indepTest(y,d) iTy dependogram(iTy, print=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.