WGCNA: hierarchicalConsensusTOM – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

WGCNA

hierarchicalConsensusTOM

Calculation of hierarchical consensus topological overlap matrix

Description

This function calculates consensus topological overlap in a hierarchical manner.

Usage

hierarchicalConsensusTOM(
      # ... information needed to calculate individual TOMs
      multiExpr,
      multiWeights = NULL,

      # Data checking options
      checkMissingData = TRUE,

      # Blocking options
      blocks = NULL,
      maxBlockSize = 20000,
      blockSizePenaltyPower = 5,
      nPreclusteringCenters = NULL,
      randomSeed = 12345,

      # Network construction options
      networkOptions,

      # Save individual TOMs?

      keepIndividualTOMs = TRUE,
      individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",

      # ... or information about individual (more precisely, input) TOMs
      individualTOMInfo = NULL,

      # Consensus calculation options 
      consensusTree,

      useBlocks = NULL,

      # Save calibrated TOMs?
      saveCalibratedIndividualTOMs = FALSE,
      calibratedIndividualTOMFilePattern = "calibratedIndividualTOM-Set%s-Block%b.RData",

      # Return options
      saveConsensusTOM = TRUE,
      consensusTOMFilePattern = "consensusTOM-%a-Block%b.RData",
      getCalibrationSamples = FALSE,

      # Return the intermediate results as well?  
      keepIntermediateResults = saveConsensusTOM,

      # Internal handling of TOMs
      useDiskCache = NULL, 
      chunkSize = NULL,
      cacheDir = ".",
      cacheBase = ".blockConsModsCache",

      # Behavior
      collectGarbage = TRUE,
      verbose = 1,
      indent = 0)

Arguments

`multiExpr`	Expression data in the multi-set format (see `checkSets`). A vector of lists, one per set. Each set must contain a component `data` that contains the expression data, with rows corresponding to samples and columns to genes or probes.
`multiWeights`	optional observation weights in the same format (and dimensions) as `multiExpr`. These weights are used for correlation calculations with data in `multiExpr`.
`checkMissingData`	Logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
`blocks`	Optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of `multiExpr` giving the number of the block to which the corresponding gene belongs.
`maxBlockSize`	Integer giving maximum block size for module detection. Ignored if `blocks` above is non-NULL. Otherwise, if the number of genes in `datExpr` exceeds `maxBlockSize`, genes will be pre-clustered into blocks whose size should not exceed `maxBlockSize`.
`blockSizePenaltyPower`	Number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or `Inf` if not exceeding maximum block size is very important.
`nPreclusteringCenters`	Number of centers to be used in the preclustering. Defaults to smaller of `nGenes/20` and `100*nGenes/maxBlockSize`, where `nGenes` is the nunber of genes (variables) in `multiExpr`.
`randomSeed`	Integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If `NULL` is given, the function will not save and restore the seed.
`networkOptions`	A single list of class `NetworkOptions` giving options for network calculation for all of the networks, or a `multiData` structure containing one such list for each input data set.
`keepIndividualTOMs`	Logical: should individual TOMs be retained after the calculation is finished?
`individualTOMFileNames`	Character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: `%s` will be replaced by the set number; `%N` will be replaced by the set name (taken from `names(multiExpr)`) if it exists, otherwise by set number; `%b` will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
`individualTOMInfo`	A list, typically returned by `individualTOMs`, containing information about the topological overlap matrices in the individual data sets in `multiExpr`. See the output of `individualTOMs` for details on the content of the list.
`consensusTree`	A list specifying the consensus calculation. See details.
`useBlocks`	Optional vector giving the blocks that should be used for the calcualtions. If `NULL`, all all blocks will be used.
`saveCalibratedIndividualTOMs`	Logical: should the calibrated individual TOMs be saved?
`calibratedIndividualTOMFilePattern`	Specification of file names in which calibrated individual TOMs should be saved.
`saveConsensusTOM`	Logical: should the consensus TOM be saved to disk?
`consensusTOMFilePattern`	Character string giving the file names to save consensus TOMs into. The following tags should be used to make the file names unique for each set and block: `%s` will be replaced by the set number; `%N` will be replaced by the set name (taken from `names(multiExpr)`) if it exists, otherwise by set number; `%b` will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
`getCalibrationSamples`	Logical: should the sampled values used for network calibration be returned?
`keepIntermediateResults`	Logical: should intermediate consensus TOMs be saved as well?
`useDiskCache`	Logical: should disk cache be used for consensus calculations? The disk cache can be used to store chunks of calibrated data that are small enough to fit one chunk from each set into memory (blocks may be small enough to fit one block of one set into memory, but not small enough to fit one block from all sets in a consensus calculation into memory at the same time). Using disk cache is slower but lessens the memory footprint of the calculation. As a general guide, if individual data are split into blocks, we recommend setting this argument to `TRUE`. If this argument is `NULL`, the function will decide whether to use disk cache based on the number of sets and block sizes.
`chunkSize`	network similarities are saved in smaller chunks of size `chunkSize`. If `NULL`, an appropriate chunk size will be determined from an estimate of available memory. Note that if the chunk size is greater than the memory required for storing intemediate results, disk cache use will automatically be disabled.
`cacheDir`	character string containing the directory into which cache files should be written. The user should make sure that the filesystem has enough free space to hold the cache files which can get quite large.
`cacheBase`	character string containing the desired name for the cache files. The actual file names will consists of `cacheBase` and a suffix to make the file names unique.
`collectGarbage`	Logical: should garbage be collected after memory-intensive operations?
`verbose`	integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
`indent`	indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Details

This function is essentially a wrapper for hierarchicalConsensusCalculation, with a few additional operations specific to calculations of topological overlaps.

Value

A list that contains the output of hierarchicalConsensusCalculation and two extra components:

`individualTOMInfo`	A copy of the input `individualTOMInfo` if it was non-`NULL`, or the result of `individualTOMs`.
`consensusTree`	A copy of the input `consensusTree`.

Author(s)

Peter Langfelder