Meta-analysis of binary and continuous variables
This is a meta-analysis complement to functions standardScreeningBinaryTrait
and
standardScreeningNumericTrait
. Given expression (or other) data from multiple independent
data sets, and the corresponding clinical traits or outcomes, the function calculates multiple screening
statistics in each data set, then calculates meta-analysis Z scores, p-values, and optionally q-values
(False Discovery Rates). Three different ways of calculating the meta-analysis Z scores are provided: the
Stouffer method, weighted Stouffer method, and using user-specified weights.
metaAnalysis(multiExpr, multiTrait, binary = NULL, metaAnalysisWeights = NULL, corFnc = cor, corOptions = list(use = "p"), getQvalues = FALSE, getAreaUnderROC = FALSE, useRankPvalue = TRUE, rankPvalueOptions = list(), setNames = NULL, kruskalTest = FALSE, var.equal = FALSE, metaKruskal = kruskalTest, na.action = "na.exclude")
multiExpr |
Expression data (or other data) in multi-set format (see |
multiTrait |
Trait or ourcome data in multi-set format. Only one trait is allowed; consequesntly, the |
binary |
Logical: is the trait binary ( |
metaAnalysisWeights |
Optional specification of set weights for meta-analysis. If given, must be a vector of non-negative
weights, one entry for each set contained in |
corFnc |
Correlation function to be used for screening. Should be either the default |
corOptions |
A named list giving extra arguments to be passed to the correlation function. |
getQvalues |
Logical: should q-values (FDRs) be calculated? |
getAreaUnderROC |
Logical: should area under the ROC be calculated? Caution, enabling the calculation will slow the function down considerably for large data sets. |
useRankPvalue |
Logical: should the |
rankPvalueOptions |
Additional options for function |
setNames |
Optional specification of set names (labels). These are used to label the corresponding components of the
output. If not given, will be taken from the |
kruskalTest |
Logical: should the Kruskal test be performed in addition to t-test? Only applies to binary traits. |
var.equal |
Logical: should the t-test assume equal variance in both groups? If |
metaKruskal |
Logical: should the meta-analysis be based on the results of Kruskal test ( |
na.action |
Specification of what should happen to missing values in |
The Stouffer method of combines Z statistics by simply taking a mean of input Z statistics and multiplying
it by sqrt(n)
, where n
is the number of input data sets. We refer to this method as
Stouffer.equalWeights
. In general, a better (i.e., more powerful) method of combining Z statistics is
to weigh them by the number of degrees of freedom (which approximately equals n
). We refer to this
method as weightedStouffer
. Finally, the user can also specify custom weights, for example if a data
set needs to be downweighted due to technical concerns; however, specifying own weights by hand should be
done carefully to avoid possible selection biases.
Data frame with the following components:
ID |
Identifier of the input genes (or other variables) |
Z.equalWeights |
Meta-analysis Z statistics obtained using Stouffer's method with equal weights |
p.equalWeights |
p-values corresponding to |
q.equalWeights |
q-values corresponding to |
Z.RootDoFWeights |
Meta-analysis Z statistics obtained using Stouffer's method with weights given by the square root of the number of (non-missing) samples in each data set |
p.RootDoFWeights |
p-values corresponding to |
q.RootDoFWeights |
q-values corresponding to |
Z.DoFWeights |
Meta-analysis Z statistics obtained using Stouffer's method with weights given by the number of (non-missing) samples in each data set |
p.DoFWeights |
p-values corresponding to |
q.DoFWeights |
q-values corresponding to |
Z.userWeights |
Meta-analysis Z statistics
obtained using Stouffer's method with user-defined weights. Only present if input |
p.userWeights |
p-values corresponding to |
q.userWeights |
q-values corresponding to |
The next set of columns is present only if input useRankPvalue
is TRUE
and contain the output
of the function rankPvalue
with the same column weights as the above meta-analysis. Depending
on the input options calculateQvalue
and pValueMethod
in rankPvalueOptions
, some
columns may be missing. The following columns are calculated using equal weights for each data set.
pValueExtremeRank.equalWeights |
This is the minimum between pValueLowRank and pValueHighRank, i.e. min(pValueLow, pValueHigh) |
pValueLowRank.equalWeights |
Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method. |
pValueHighRank.equalWeights |
Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method. |
pValueExtremeScale.equalWeights |
This is the minimum between pValueLowScale and pValueHighScale, i.e. min(pValueLow, pValueHigh) |
pValueLowScale.equalWeights |
Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method. |
pValueHighScale.equalWeights |
Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method. |
qValueExtremeRank.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueExtremeRank |
qValueLowRank.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueLowRank |
qValueHighRank.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueHighRank |
qValueExtremeScale.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueExtremeScale |
qValueLowScale.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueLowScale |
qValueHighScale.equalWeights |
local false discovery rate (q-value) corresponding to the p-value pValueHighScale |
... |
Analogous columns calculated by weighting each input set using the square root of the number of
samples, number of samples, and user weights (if given). The corresponding column names carry the suffixes
|
The following columns contain results returned by standardScreeningBinaryTrait
or
standardScreeningNumericTrait
(depending on whether the input trait is binary or continuous).
For binary traits, the following information is returned for each set:
corPearson.Set_1, corPearson.Set_2,... |
Pearson correlation with a binary numeric version of the input variable. The numeric variable equals 1 for level 1 and 2 for level 2. The levels are given by levels(factor(y)). |
t.Student.Set_1, t.Student.Set_2, ... |
Student t-test statistic |
pvalueStudent.Set_1, pvalueStudent.Set_2, ... |
two-sided Student t-test p-value. |
qvalueStudent.Set_1, qvalueStudent.Set_2, ... |
(if input |
foldChange.Set_1, foldChange.Set_2, ... |
a (signed) ratio of mean values. If the mean in the first group (corresponding to level 1) is larger than that of the second group, it equals meanFirstGroup/meanSecondGroup. But if the mean of the second group is larger than that of the first group it equals -meanSecondGroup/meanFirstGroup (notice the minus sign). |
meanFirstGroup.Set_1, meanSecondGroup.Set_2, ... |
means of columns in input |
SE.FirstGroup.Set_1, SE.FirstGroup.Set_2, ... |
standard errors of columns in input |
SE.SecondGroup.Set_1, SE.SecondGroup.Set_2, ... |
standard errors of columns in input |
areaUnderROC.Set_1, areaUnderROC.Set_2, ... |
the area under the ROC, also known as the concordance
index or C.index. This is a measure of discriminatory power. The measure lies between 0 and 1 where 0.5
indicates no discriminatory power. 0 indicates that the "opposite" predictor has perfect discriminatory
power. To compute it we use the function rcorr.cens with |
nPresentSamples.Set_1, nPresentSamples.Set_2, ... |
number of samples with finite measurements for each gene. |
If input kruskalTest
is TRUE
, the following columns further summarize results of
Kruskal-Wallis test:
stat.Kruskal.Set_1, stat.Kruskal.Set_2, ... |
Kruskal-Wallis test statistic. |
stat.Kruskal.signed.Set_1, stat.Kruskal.signed.Set_2,... |
(Warning: experimental) Kruskal-Wallis test statistic including a sign that indicates whether the average rank is higher in second group (positive) or first group (negative). |
pvaluekruskal.Set_1, pvaluekruskal.Set_2, ... |
Kruskal-Wallis test p-value. |
qkruskal.Set_1, qkruskal.Set_2, ... |
q-values corresponding to the Kruskal-Wallis test p-value (if
input |
Z.Set1, Z.Set2, ... |
Z statistics obtained from |
For numeric traits, the following columns are returned:
cor.Set_1, cor.Set_2, ... |
correlations of all genes with the trait |
Z.Set1, Z.Set2, ... |
Fisher Z statistics corresponding to the correlations |
pvalueStudent.Set_1, pvalueStudent.Set_2, ... |
Student p-values of the correlations |
qvalueStudent.Set_1, qvalueStudent.Set_1, ... |
(if input |
AreaUnderROC.Set_1, AreaUnderROC.Set_2, ... |
area under the ROC |
nPresentSamples.Set_1, nPresentSamples.Set_2, ... |
number of samples present for the calculation of each association. |
Peter Langfelder
For Stouffer's method, see
Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A. & Williams, R.M. Jr. 1949. The American Soldier, Vol. 1: Adjustment during Army Life. Princeton University Press, Princeton.
A discussion of weighted Stouffer's method can be found in
Whitlock, M. C., Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach, Journal of Evolutionary Biology 18:5 1368 (2005)
standardScreeningBinaryTrait
, standardScreeningNumericTrait
for screening
functions for individual data sets
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.