Calculate weighted adjacency matrices based on mutual information
The function calculates different types of weighted adjacency matrices based on the mutual information between vectors (corresponding to the columns of the input data frame datE). The mutual information between pairs of vectors is divided by an upper bound so that the resulting normalized measure lies between 0 and 1.
mutualInfoAdjacency( datE, discretizeColumns = TRUE, entropyEstimationMethod = "MM", numberBins = NULL)
datE |
|
discretizeColumns |
is a logical variable. If it is set to TRUE then the columns of |
entropyEstimationMethod |
takes a text string for specifying the entropy and mutual information estimation method. If |
numberBins |
is an integer larger than 0 which specifies how many bins are used for the discretization step. This argument is only relevant if |
The function inputs a data frame datE
and outputs a list whose components correspond to different weighted network adjacency measures defined beteween the columns of datE
. Make sure to install the following R packages entropy
, minet
, infotheo
since
the function mutualInfoAdjacency
makes use of the entropy
function from the R package entropy
(Hausser and Strimmer 2008) and functions from the minet
and infotheo
package (Meyer et al 2008).
A weighted network adjacency matrix is a symmetric matrix whose entries take on values between 0 and 1. Each weighted adjacency matrix contains scaled versions of the mutual information between the columns of the input data frame datE
.
We assume that datE contains numeric values which will be discretized unless the user chooses the option discretizeColumns=FALSE
.
The raw (unscaled) mutual information and entropy measures have units "nat", i.e. natural logarithms are used in their definition (base e=2.71..).
Several mutual information estimation methods have been proposed in the literature (reviewed in Hausser and Strimmer 2008, Meyer et al 2008).
While mutual information networks allows one to detect non-linear relationships between the columns of datE
, they may overfit the data if relatively few observations are available. Thus, if the number of rows of datE
is smaller than say 200, it may be better to fit a correlation using the function adjacency
.
The function outputs a list with the following components:
Entropy |
is a vector whose components report entropy estimates of each column of |
MutualInformation |
is a symmetric matrix whose entries contain the pairwise mutual information
measures between the columns of |
AdjacencySymmetricUncertainty |
is a weighted adjacency matrix whose entries are based on the mutual
information. Using the notation from the Wikipedia entry, this matrix contains the mutual information
estimates |
AdjacencyUniversalVersion1 |
is a weighted adjacency matrix that is a simple function of the
|
AdjacencyUniversalVersion2 |
is a weighted adjacency matrix for which dissUAversion2=1- |
Steve Horvath, Lin Song, Peter Langfelder
Hausser J, Strimmer K (2008) Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. See http://arxiv.org/abs/0811.3579
Patrick E. Meyer, Frederic Lafitte, and Gianluca Bontempi. minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information. BMC Bioinformatics, Vol 9, 2008
Kraskov A, Stoegbauer H, Andrzejak RG, Grassberger P (2003) Hierarchical Clustering Based on Mutual Information. ArXiv q-bio/0311039
# Load requisite packages. These packages are considered "optional", # so WGCNA does not load them automatically. if (require(infotheo, quietly = TRUE) && require(minet, quietly = TRUE) && require(entropy, quietly = TRUE)) { # Example can be executed. #Simulate a data frame datE which contains 5 columns and 50 observations m=50 x1=rnorm(m) r=.5; x2=r*x1+sqrt(1-r^2)*rnorm(m) r=.3; x3=r*(x1-.5)^2+sqrt(1-r^2)*rnorm(m) x4=rnorm(m) r=.3; x5=r*x4+sqrt(1-r^2)*rnorm(m) datE=data.frame(x1,x2,x3,x4,x5) #calculate entropy, mutual information matrix and weighted adjacency # matrices based on mutual information. MIadj=mutualInfoAdjacency(datE=datE) } else printFlush(paste("Please install packages infotheo, minet and entropy", "before running this example."));
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.