Calculate individual correlation network matrices
This function calculates correlation network matrices (adjacencies or topological overlaps), after optionally first pre-clustering input data into blocks.
individualTOMs( multiExpr, multiWeights = NULL, multiExpr.imputed = NULL, # Data checking options checkMissingData = TRUE, # Blocking options blocks = NULL, maxBlockSize = 5000, blockSizePenaltyPower = 5, nPreclusteringCenters = NULL, randomSeed = 54321, # Network construction options networkOptions, # Save individual TOMs? saveTOMs = TRUE, individualTOMFileNames = "individualTOM-Set%s-Block%b.RData", # Behaviour options collectGarbage = TRUE, verbose = 2, indent = 0)
multiExpr |
expression data in the multi-set format (see |
multiWeights |
optional observation weights in the same format (and dimensions) as |
multiExpr.imputed |
Optional version of |
checkMissingData |
logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details. |
blocks |
optional specification of blocks in which hierarchical clustering and module detection
should be performed. If given, must be a numeric vector with one entry per gene
of |
maxBlockSize |
integer giving maximum block size for module detection. Ignored if |
blockSizePenaltyPower |
number specifying how strongly blocks should be penalized for exceeding the
maximum size. Set to a lrge number or |
nPreclusteringCenters |
number of centers to be used in the preclustering. Defaults to smaller of
|
randomSeed |
integer to be used as seed for the random number generator before the function
starts. If a current seed exists, it is saved and restored upon exit. If |
networkOptions |
A single list of class |
saveTOMs |
logical: should individual TOMs be saved to disk ( |
individualTOMFileNames |
character string giving the file names to save individual TOMs into. The
following tags should be used to make the file names unique for each set and block: |
collectGarbage |
Logical: should garbage collection be called after each block calculation? This can be useful when the data are large, but could unnecessarily slow down calculation with small data. |
verbose |
Integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose. |
indent |
Indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces. |
The function starts by optionally filtering out samples that have too many missing entries and genes that have either too many missing entries or zero variance in at least one set. Genes that are filtered out are excluded from the network calculations.
If blocks
is not given and
the number of genes (columns) in multiExpr
exceeds maxBlockSize
, genes are pre-clustered into blocks using the function
consensusProjectiveKMeans
; otherwise all genes are treated in a single block. Any missing data
in multiExpr
will be imputed; if imputed data are already available, they can be supplied separately.
For each block of genes, the network adjacency is constructed and (if requested) topological overlap is calculated in each set. The topological overlaps can be saved to disk as RData files, or returned directly within the return value (see below). Note that the matrices can be big and returning them within the return value can quickly exhaust the system's memory. In particular, if the block-wise calculation is necessary, it is usually impossible to return all matrices in the return value.
A list with the following components:
blockwiseAdjacencies |
A |
setNames |
A copy of |
nSets |
Number of sets in |
blockInfo |
A list of class |
networkOptions |
The input |
Peter Langfelder
Input arguments and output components of this function use multiData
,
NetworkOptions
, BlockwiseData
, and BlockInformation
.
Underlying functions of interest include consensusProjectiveKMeans
,
TOMsimilarityFromExpr
.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.