Consensus selection of group representatives
Given multiple data sets corresponding to the same variables and a grouping of variables into groups, the function selects a representative variable for each group using a variety of possible selection approaches. Typical uses include selecting a representative probe for each gene in microarray data.
consensusRepresentatives( mdx, group, colID, consensusQuantile = 0, method = "MaxMean", useGroupHubs = TRUE, calibration = c("none", "full quantile"), selectionStatisticFnc = NULL, connectivityPower = 1, minProportionPresent = 1, getRepresentativeData = TRUE, statisticFncArguments = list(), adjacencyArguments = list(), verbose = 2, indent = 0)
mdx |
A |
group |
Character vector whose components contain the group label (e.g. a character string) for
each entry of |
colID |
Character vector of column identifiers. This must include all the column names from
|
consensusQuantile |
A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used. |
method |
character string for determining which method is used to choose the representative
(when |
useGroupHubs |
Logical: if |
calibration |
Character string describing the method of calibration of the selection statistic among
the data sets. Recognized values are |
selectionStatisticFnc |
User-supplied function used to calculate the selection statistic when
|
connectivityPower |
Positive number (typically integer) for specifying the soft-thresholding power used
to construct the signed weighted adjacency matrix, see the description of |
minProportionPresent |
A number between 0 and 1 specifying a filter of candidate probes. Specifically, for each group, the variable
with the maximum consensus proportion of present data is found. Only variables whose consensus proportion of
present data is at least |
getRepresentativeData |
Logical: should the representative data, i.e., |
statisticFncArguments |
A list giving further arguments to the selection statistic function. Can be
used to supply additional arguments to the user-specified |
adjacencyArguments |
Further arguments to the function |
verbose |
Level of verbosity; 0 means silent, larger values will cause progress messages to be printed. |
indent |
Indent for the diagnostic messages; each unit equals two spaces. |
This function was inspired by collapseRows
, but there are also important differences. This function
focuses on selecting representatives; when summarization is more important, collapseRows
provides more
flexibility since it does not require that a single representative be selected.
This function and collapseRows
use different input and ouput conventions; user-specified functions need
to be tailored differently for collapseRows
than for consensusRepresentatives
.
Missing data are allowed and are treated as missing at random. If rowID
is NULL
, it is replaced
by the variable names in mdx
.
All groups with a single variable are represented by that variable, unless the consensus proportion of present
data in the variable is lower than minProportionPresent
, in which case the variable and the group are
excluded from the output.
For all variables belonging to groups with 2 variables (when useGroupHubs=TRUE
) or with at least 2 variables
(when useGroupHubs=FALSE
), selection statistics are calculated in each set (e.g., the selection
statistic may be the mean, variance, etc). This results in a matrix of selection statistics (one entry per
variable per data set). The selection statistics are next optionally calibrated (normalized) between sets to
make them comparable; currently the only implemented calibration method is quantile normalization.
For
each variable, the consensus selection statistic is defined as the
consensus of the (calibrated) selection statistics across the data sets is calculated. The
'consensus' of a vector (say 'x') is simply defined as the quantile with probability
consensusQuantile
of the vector x. Important exception: for the "MinMean"
and
"absMinMean"
methods, the consensus is the quantile with probability 1-consensusQuantile
, since
the idea of the consensus is to select the worst (or close to worst) value across the data sets.
For each group, the representative is selected as the variable with the best (typically highest, but for
"MinMean"
and
"absMinMean"
methods the lowest) consensus selection statistic.
If useGroupHubs=TRUE
, the intra-group connectivity is calculated for all variables in each set. The
intra-group connectivities are optionally calibrated (normalized) between sets, and consensus intra-group
connectivity is calculated similarly to the consensus selection statistic above. In each group, the variable
with the highest consensus intra-group connectivity is chosen as the representative.
representatives |
A named vector giving, for each group, the selected representative (input |
varSelected |
A logical vector with one entry per variable (column) in input |
representativeData |
Only present if |
Peter Langfelder, based on code by Jeremy Miller
multiData
for a description of the multiData
structures;
collapseRows
that solves a related but different problem. Please note the differences in input
and output!
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.