WGCNA: consensusRepresentatives – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

consensusRepresentatives

Consensus selection of group representatives

Description

Given multiple data sets corresponding to the same variables and a grouping of variables into groups, the function selects a representative variable for each group using a variety of possible selection approaches. Typical uses include selecting a representative probe for each gene in microarray data.

Usage

consensusRepresentatives(
   mdx, 
   group, 
   colID, 
   consensusQuantile = 0, 
   method = "MaxMean", 
   useGroupHubs = TRUE, 
   calibration = c("none", "full quantile"), 
   selectionStatisticFnc = NULL, 
   connectivityPower = 1, 
   minProportionPresent = 1, 
   getRepresentativeData = TRUE, 
   statisticFncArguments = list(), 
   adjacencyArguments = list(), 
   verbose = 2, indent = 0)

Arguments

`mdx`	A `multiData` structure. All sets must have the same columns.
`group`	Character vector whose components contain the group label (e.g. a character string) for each entry of `colID`. This vector must be of the same length as the vector `colID`. In gene expression applications, this vector could contain the gene symbol (or a co-expression module label).
`colID`	Character vector of column identifiers. This must include all the column names from `mdx`, but can include other values as well. Its entries must be unique (no duplicates) and no missing values are permitted.
`consensusQuantile`	A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used.
`method`	character string for determining which method is used to choose the representative (when `useGroupHubs` is `TRUE`, this method is only used for groups with 2 variables). The following values can be used: "MaxMean" (default) or "MinMean" return the variable with the highest or lowest mean value, respectively; "maxRowVariance" return the variable with the highest variance; "absMaxMean" or "absMinMean" return the variable with the highest or lowest mean absolute value; and "function" will call a user-input function (see the description of the argument `selectionStatisticFnc`). The built-in functions can be instructed to use robust analogs (median and median absolute deviation) by also specifying `statisticFncArguments=list(robust = TRUE)`.
`useGroupHubs`	Logical: if `TRUE`, groups with 3 or more variables will be represented by the variable with the highest connectivity according to a signed weighted correlation network adjacency matrix among the corresponding rows. The connectivity is defined as the row sum of the adjacency matrix. The signed weighted adjacency matrix is defined as A=(0.5+0.5*COR)^power where power is determined by the argument `connectivityPower` and COR denotes the matrix of pairwise correlation coefficients among the corresponding rows. Additional arguments to the underlying function `adjacency` can be specified using the argument `adjacencyArguments` below.
`calibration`	Character string describing the method of calibration of the selection statistic among the data sets. Recognized values are `"none"` (no calibration) and `"full quantile"` (quantile normalization).
`selectionStatisticFnc`	User-supplied function used to calculate the selection statistic when `method` above equals `"function"`. The function must take argumens `x` (a matrix) and possibly other arguments that can be specified using `statisticFncArguments` below. The return value must be a vector with one component per column of `x` giving the selection statistic for each column.
`connectivityPower`	Positive number (typically integer) for specifying the soft-thresholding power used to construct the signed weighted adjacency matrix, see the description of `useGroupHubs`. This option is only used if `useGroupHubs` is `TRUE`.
`minProportionPresent`	A number between 0 and 1 specifying a filter of candidate probes. Specifically, for each group, the variable with the maximum consensus proportion of present data is found. Only variables whose consensus proportion of present data is at least `minProportionPresent` times the maximum consensus proportion are retained as candidates for being a representative.
`getRepresentativeData`	Logical: should the representative data, i.e., `mdx` restricted to the representative variables, be returned?
`statisticFncArguments`	A list giving further arguments to the selection statistic function. Can be used to supply additional arguments to the user-specified `selectionStatisticFnc`; the value `list(robust = TRUE)` can be used with the built-in functions to use their robust variants.
`adjacencyArguments`	Further arguments to the function `adjacency`, e.g. `adjacencyArguments=list(corFnc = "bicor", corOptions = "use = 'p', maxPOutliers = 0.05")` will select the robust correlation `bicor` with a good set of options. Note that the `adjacency` arguments `type` and `power` cannot be changed.
`verbose`	Level of verbosity; 0 means silent, larger values will cause progress messages to be printed.
`indent`	Indent for the diagnostic messages; each unit equals two spaces.

Details

This function was inspired by collapseRows, but there are also important differences. This function focuses on selecting representatives; when summarization is more important, collapseRows provides more flexibility since it does not require that a single representative be selected.

This function and collapseRows use different input and ouput conventions; user-specified functions need to be tailored differently for collapseRows than for consensusRepresentatives.

Missing data are allowed and are treated as missing at random. If rowID is NULL, it is replaced by the variable names in mdx.

All groups with a single variable are represented by that variable, unless the consensus proportion of present data in the variable is lower than minProportionPresent, in which case the variable and the group are excluded from the output.

For all variables belonging to groups with 2 variables (when useGroupHubs=TRUE) or with at least 2 variables (when useGroupHubs=FALSE), selection statistics are calculated in each set (e.g., the selection statistic may be the mean, variance, etc). This results in a matrix of selection statistics (one entry per variable per data set). The selection statistics are next optionally calibrated (normalized) between sets to make them comparable; currently the only implemented calibration method is quantile normalization.

For each variable, the consensus selection statistic is defined as the consensus of the (calibrated) selection statistics across the data sets is calculated. The 'consensus' of a vector (say 'x') is simply defined as the quantile with probability consensusQuantile of the vector x. Important exception: for the "MinMean" and "absMinMean" methods, the consensus is the quantile with probability 1-consensusQuantile, since the idea of the consensus is to select the worst (or close to worst) value across the data sets.

For each group, the representative is selected as the variable with the best (typically highest, but for "MinMean" and "absMinMean" methods the lowest) consensus selection statistic.

If useGroupHubs=TRUE, the intra-group connectivity is calculated for all variables in each set. The intra-group connectivities are optionally calibrated (normalized) between sets, and consensus intra-group connectivity is calculated similarly to the consensus selection statistic above. In each group, the variable with the highest consensus intra-group connectivity is chosen as the representative.

Value

`representatives`	A named vector giving, for each group, the selected representative (input `rowID` or the variable (column) name in `mdx`). Names correspond to groups.
`varSelected`	A logical vector with one entry per variable (column) in input `mdx` (possibly after restriction to variables occurring in `colID`), `TRUE` if the column was selected as a representative.
`representativeData`	Only present if `getRepresentativeData` is `TRUE`; the input `mdx` restricted to the representative variables, with column names changed to the corresponding groups.

Author(s)

Peter Langfelder, based on code by Jeremy Miller