Explore the Mean-Variance Relationship for DGE Data
Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. Here are functions to compute gene means and variances, as well at looking at these quantities when data is binned based on overall expression level.
plotMeanVar(object, meanvar=NULL, show.raw.vars=FALSE, show.tagwise.vars=FALSE, show.binned.common.disp.vars=FALSE, show.ave.raw.vars=TRUE, scalar=NULL, NBline=FALSE, nbins=100, log.axes="xy", xlab=NULL, ylab=NULL, ...) binMeanVar(x, group, nbins=100, common.dispersion=FALSE, object=NULL)
object |
|
meanvar |
list (optional) containing the output from |
show.raw.vars |
logical, whether or not to display the raw (pooled) genewise variances on the mean-variance plot. Default is |
show.tagwise.vars |
logical, whether or not to display the estimated genewise variances on the mean-variance plot (note that ‘tag’ and ‘gene’ are synonymous). Default is |
show.binned.common.disp.vars |
logical, whether or not to compute the common dispersion for each bin of genes and show the variances computed from those binned common dispersions and the mean expression level of the respective bin of genes. Default is |
show.ave.raw.vars |
logical, whether or not to show the average of the raw variances for each bin of genes plotted against the average expression level of the genes in the bin. Averages are taken on the square root scale as regular arithmetic means are likely to be upwardly biased for count data, whereas averaging on the square scale gives a better summary of the mean-variance relationship in the data. The default is |
scalar |
vector (optional) of scaling values to divide counts by. Would expect to have this the same length as the number of columns in the count matrix (i.e. the number of libraries). |
NBline |
logical, whether or not to add a line on the graph showing the mean-variance relationship for a NB model with common dispersion. |
nbins |
scalar giving the number of bins (formed by using the quantiles of the genewise mean expression levels) for which to compute average means and variances for exploring the mean-variance relationship. Default is |
log.axes |
character vector indicating if any of the axes should use a log scale. Default is |
xlab |
character string giving the label for the x-axis. Standard graphical parameter. If left as the default |
ylab |
character string giving the label for the y-axis. Standard graphical parameter. If left as the default |
... |
further arguments passed on to |
x |
matrix of count data, with rows representing genes and columns representing samples |
group |
factor giving the experimental group or condition to which each sample (i.e. column of |
common.dispersion |
logical, whether or not to compute the common dispersion for each bin of genes. |
This function is useful for exploring the mean-variance relationship in the data. Raw variances are, for each gene, the pooled variance of the counts from each sample, divided by a scaling factor (by default the effective library size). The function will plot the average raw variance for genes split into nbins
bins by overall expression level. The averages are taken on the square-root scale as for count data the arithmetic mean is upwardly biased. Taking averages on the square-root scale provides a useful summary of how the variance of the gene counts change with respect to expression level (abundance). A line showing the Poisson mean-variance relationship (mean equals variance) is always shown to illustrate how the genewise variances may differ from a Poisson mean-variance relationship. Optionally, the raw variances and estimated genewise variances can also be plotted. Estimated genewise variances can be calculated using either qCML estimates of the genewise dispersions (estimateTagwiseDisp
) or Cox-Reid conditional inference estimates (CRDisp
). A log-log scale is used for the plot.
plotMeanVar
produces a mean-variance plot for the DGE data using the options described above. plotMeanVar
and binMeanVar
both return a list with the following components:
avemeans |
vector of the average expression level within each bin of genes, with the average taken on the square-root scale |
avevars |
vector of the average raw pooled gene-wise variance within each bin of genes, with the average taken on the square-root scale |
bin.means |
list containing the average (mean) expression level for genes divided into bins based on amount of expression |
bin.vars |
list containing the pooled variance for genes divided into bins based on amount of expression |
means |
vector giving the mean expression level for each gene |
vars |
vector giving the pooled variance for each gene |
bins |
list giving the indices of the genes in each bin, ordered from lowest expression bin to highest |
Davis McCarthy
plotMeanVar2
is related in purpose but works from raw counts and uses standardized residuals.
plotMDS.DGEList
, plotSmear
and
maPlot
provide more ways of visualizing DGE data.
y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4) d <- DGEList(counts=y,group=c(1,1,2,2),lib.size=c(1000:1003)) plotMeanVar(d) # Produce a straight-forward mean-variance plot # Produce a mean-variance plot with the raw variances shown and save the means # and variances for later use meanvar <- plotMeanVar(d, show.raw.vars=TRUE) ## If we want to show estimated genewise variances on the plot, we must first estimate them! d <- estimateCommonDisp(d) # Obtain an estimate of the dispersion parameter d <- estimateTagwiseDisp(d) # Obtain genewise dispersion estimates # Use previously saved object to speed up plotting plotMeanVar(d, meanvar=meanvar, show.tagwise.vars=TRUE, NBline=TRUE) ## We could also estimate common/genewise dispersions using the Cox-Reid methods with an ## appropriate design matrix
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.