WGCNA: metaAnalysis – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

metaAnalysis

Meta-analysis of binary and continuous variables

Description

This is a meta-analysis complement to functions standardScreeningBinaryTrait and standardScreeningNumericTrait. Given expression (or other) data from multiple independent data sets, and the corresponding clinical traits or outcomes, the function calculates multiple screening statistics in each data set, then calculates meta-analysis Z scores, p-values, and optionally q-values (False Discovery Rates). Three different ways of calculating the meta-analysis Z scores are provided: the Stouffer method, weighted Stouffer method, and using user-specified weights.

Usage

metaAnalysis(multiExpr, multiTrait, 
             binary = NULL, 
             metaAnalysisWeights = NULL, 
             corFnc = cor, corOptions = list(use = "p"), 
             getQvalues = FALSE, 
             getAreaUnderROC = FALSE,
             useRankPvalue = TRUE,
             rankPvalueOptions = list(),
             setNames = NULL, 
             kruskalTest = FALSE, var.equal = FALSE, 
             metaKruskal = kruskalTest, na.action = "na.exclude")

Arguments

`multiExpr`	Expression data (or other data) in multi-set format (see `checkSets`). A vector of lists; in each list there must be a component named `data` whose content is a matrix or dataframe or array of dimension 2.
`multiTrait`	Trait or ourcome data in multi-set format. Only one trait is allowed; consequesntly, the `data` component of each component list can be either a vector or a data frame (matrix, array of dimension 2).
`binary`	Logical: is the trait binary (`TRUE`) or continuous (`FALSE`)? If not given, the decision will be made based on the content of `multiTrait`.
`metaAnalysisWeights`	Optional specification of set weights for meta-analysis. If given, must be a vector of non-negative weights, one entry for each set contained in `multiExpr`.
`corFnc`	Correlation function to be used for screening. Should be either the default `cor` or its robust alternative, `bicor`.
`corOptions`	A named list giving extra arguments to be passed to the correlation function.
`getQvalues`	Logical: should q-values (FDRs) be calculated?
`getAreaUnderROC`	Logical: should area under the ROC be calculated? Caution, enabling the calculation will slow the function down considerably for large data sets.
`useRankPvalue`	Logical: should the `rankPvalue` function be used to obtain alternative meta-analysis statistics?
`rankPvalueOptions`	Additional options for function `rankPvalue`. These include `na.last` (default `"keep"`), `ties.method` (default `"average"`), `calculateQvalue` (default copied from input `getQvalues`), and `pValueMethod` (default `"all"`). See the help file for `rankPvalue` for full details.
`setNames`	Optional specification of set names (labels). These are used to label the corresponding components of the output. If not given, will be taken from the `names` attribute of `multiExpr`. If `names(multiExpr)` is `NULL`, generic names of the form `Set_1, Set2, ...` will be used.
`kruskalTest`	Logical: should the Kruskal test be performed in addition to t-test? Only applies to binary traits.
`var.equal`	Logical: should the t-test assume equal variance in both groups? If `TRUE`, the function will warn the user that the returned test statistics will be different from the results of the standard `t.test` function.
`metaKruskal`	Logical: should the meta-analysis be based on the results of Kruskal test (`TRUE`) or Student t-test (`FALSE`)?
`na.action`	Specification of what should happen to missing values in `t.test`.

Details

The Stouffer method of combines Z statistics by simply taking a mean of input Z statistics and multiplying it by sqrt(n), where n is the number of input data sets. We refer to this method as Stouffer.equalWeights. In general, a better (i.e., more powerful) method of combining Z statistics is to weigh them by the number of degrees of freedom (which approximately equals n). We refer to this method as weightedStouffer. Finally, the user can also specify custom weights, for example if a data set needs to be downweighted due to technical concerns; however, specifying own weights by hand should be done carefully to avoid possible selection biases.

Value

Data frame with the following components:

`ID`	Identifier of the input genes (or other variables)
`Z.equalWeights`	Meta-analysis Z statistics obtained using Stouffer's method with equal weights
`p.equalWeights`	p-values corresponding to `Z.Stouffer.equalWeights`
`q.equalWeights`	q-values corresponding to `p.Stouffer.equalWeights`, only present if `getQvalues` is `TRUE`.
`Z.RootDoFWeights`	Meta-analysis Z statistics obtained using Stouffer's method with weights given by the square root of the number of (non-missing) samples in each data set
`p.RootDoFWeights`	p-values corresponding to `Z.DoFWeights`
`q.RootDoFWeights`	q-values corresponding to `p.DoFWeights`, only present if `getQvalues` is `TRUE`.
`Z.DoFWeights`	Meta-analysis Z statistics obtained using Stouffer's method with weights given by the number of (non-missing) samples in each data set
`p.DoFWeights`	p-values corresponding to `Z.DoFWeights`
`q.DoFWeights`	q-values corresponding to `p.DoFWeights`, only present if `getQvalues` is `TRUE`.
`Z.userWeights`	Meta-analysis Z statistics obtained using Stouffer's method with user-defined weights. Only present if input `metaAnalysisWeights` are present.
`p.userWeights`	p-values corresponding to `Z.userWeights`
`q.userWeights`	q-values corresponding to `p.userWeights`, only present if `getQvalues` is `TRUE`.

The next set of columns is present only if input useRankPvalue is TRUE and contain the output of the function rankPvalue with the same column weights as the above meta-analysis. Depending on the input options calculateQvalue and pValueMethod in rankPvalueOptions, some columns may be missing. The following columns are calculated using equal weights for each data set.

`pValueExtremeRank.equalWeights`	This is the minimum between pValueLowRank and pValueHighRank, i.e. min(pValueLow, pValueHigh)
`pValueLowRank.equalWeights`	Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
`pValueHighRank.equalWeights`	Asymptotic p-value for observing a consistently low value across the columns of datS based on the rank method.
`pValueExtremeScale.equalWeights`	This is the minimum between pValueLowScale and pValueHighScale, i.e. min(pValueLow, pValueHigh)
`pValueLowScale.equalWeights`	Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
`pValueHighScale.equalWeights`	Asymptotic p-value for observing a consistently low value across the columns of datS based on the Scale method.
`qValueExtremeRank.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueExtremeRank
`qValueLowRank.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueLowRank
`qValueHighRank.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueHighRank
`qValueExtremeScale.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueExtremeScale
`qValueLowScale.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueLowScale
`qValueHighScale.equalWeights`	local false discovery rate (q-value) corresponding to the p-value pValueHighScale
`...`	Analogous columns calculated by weighting each input set using the square root of the number of samples, number of samples, and user weights (if given). The corresponding column names carry the suffixes `RootDofWeights`, `DoFWeights`, `userWeights`.

The following columns contain results returned by standardScreeningBinaryTrait or standardScreeningNumericTrait (depending on whether the input trait is binary or continuous).

For binary traits, the following information is returned for each set:

`corPearson.Set_1, corPearson.Set_2,...`	Pearson correlation with a binary numeric version of the input variable. The numeric variable equals 1 for level 1 and 2 for level 2. The levels are given by levels(factor(y)).
`t.Student.Set_1, t.Student.Set_2, ...`	Student t-test statistic
`pvalueStudent.Set_1, pvalueStudent.Set_2, ...`	two-sided Student t-test p-value.
`qvalueStudent.Set_1, qvalueStudent.Set_2, ...`	(if input `qValues==TRUE`) q-value (local false discovery rate) based on the Student T-test p-value (Storey et al 2004).
`foldChange.Set_1, foldChange.Set_2, ...`	a (signed) ratio of mean values. If the mean in the first group (corresponding to level 1) is larger than that of the second group, it equals meanFirstGroup/meanSecondGroup. But if the mean of the second group is larger than that of the first group it equals -meanSecondGroup/meanFirstGroup (notice the minus sign).
`meanFirstGroup.Set_1, meanSecondGroup.Set_2, ...`	means of columns in input `datExpr` across samples in the second group.
`SE.FirstGroup.Set_1, SE.FirstGroup.Set_2, ...`	standard errors of columns in input `datExpr` across samples in the first group. Recall that SE(x)=sqrt(var(x)/n) where n is the number of non-missing values of x.
`SE.SecondGroup.Set_1, SE.SecondGroup.Set_2, ...`	standard errors of columns in input `datExpr` across samples in the second group.
`areaUnderROC.Set_1, areaUnderROC.Set_2, ...`	the area under the ROC, also known as the concordance index or C.index. This is a measure of discriminatory power. The measure lies between 0 and 1 where 0.5 indicates no discriminatory power. 0 indicates that the "opposite" predictor has perfect discriminatory power. To compute it we use the function rcorr.cens with `outx=TRUE` (from Frank Harrel's package Hmisc).
`nPresentSamples.Set_1, nPresentSamples.Set_2, ...`	number of samples with finite measurements for each gene.

If input kruskalTest is TRUE, the following columns further summarize results of Kruskal-Wallis test:

`stat.Kruskal.Set_1, stat.Kruskal.Set_2, ...`	Kruskal-Wallis test statistic.
`stat.Kruskal.signed.Set_1, stat.Kruskal.signed.Set_2,...`	(Warning: experimental) Kruskal-Wallis test statistic including a sign that indicates whether the average rank is higher in second group (positive) or first group (negative).
`pvaluekruskal.Set_1, pvaluekruskal.Set_2, ...`	Kruskal-Wallis test p-value.
`qkruskal.Set_1, qkruskal.Set_2, ...`	q-values corresponding to the Kruskal-Wallis test p-value (if input `qValues==TRUE`).
`Z.Set1, Z.Set2, ...`	Z statistics obtained from `pvalueStudent.Set1, pvalueStudent.Set2, ...` or from `pvaluekruskal.Set1, pvaluekruskal.Set2, ...`, depending on input `metaKruskal`.

For numeric traits, the following columns are returned:

`cor.Set_1, cor.Set_2, ...`	correlations of all genes with the trait
`Z.Set1, Z.Set2, ...`	Fisher Z statistics corresponding to the correlations
`pvalueStudent.Set_1, pvalueStudent.Set_2, ...`	Student p-values of the correlations
`qvalueStudent.Set_1, qvalueStudent.Set_1, ...`	(if input `qValues==TRUE`) q-values of the correlations calculated from the p-values
`AreaUnderROC.Set_1, AreaUnderROC.Set_2, ...`	area under the ROC
`nPresentSamples.Set_1, nPresentSamples.Set_2, ...`	number of samples present for the calculation of each association.

Author(s)

Peter Langfelder

References

For Stouffer's method, see

Stouffer, S.A., Suchman, E.A., DeVinney, L.C., Star, S.A. & Williams, R.M. Jr. 1949. The American Soldier, Vol. 1: Adjustment during Army Life. Princeton University Press, Princeton.

A discussion of weighted Stouffer's method can be found in

Whitlock, M. C., Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach, Journal of Evolutionary Biology 18:5 1368 (2005)