Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

voomLmFit

Apply voom-lmFit Pipeline While Accounting for Loss of Residual DF Due to Exact Zeros


Description

Transform count data to log2-counts per million (logCPM), estimate voom precision weights and fit limma linear models while allowing for loss of residual degrees of freedom due to exact zeros.

Usage

voomLmFit(counts, design = NULL, block = NULL, prior.weights = NULL,
     sample.weights = FALSE, var.design = NULL, var.group = NULL, 
     lib.size = NULL, normalize.method = "none",
     span = 0.5, plot = FALSE, save.plot = FALSE)

Arguments

counts

a numeric matrix containing raw counts, or an ExpressionSet containing raw counts, or a DGEList object. Counts must be non-negative and NAs are not permitted.

design

design matrix with rows corresponding to samples and columns to coefficients to be estimated. Defaults to the unit vector meaning that samples are treated as replicates.

block

vector or factor specifying a blocking variable on the samples. Has length equal to the number of ncol(counts). Samples within each block are assumed to be correlated.

prior.weights

prior weights. Can be a numeric matrix of individual weights of same dimensions as the counts, or a numeric vector of sample weights with length equal to ncol(counts), or a numeric vector of gene weights with length equal to nrow(counts).

sample.weights

logical value, if TRUE then empirical sample quality weights will be estimated.

var.design

optional design matrix for the sample weights. Defaults to the sample-specific model whereby each sample has a distinct variance.

var.group

optional vector or factor indicating groups to have different array weights. This is another way to specify var.design for groupwise sample weights.

lib.size

numeric vector containing total library sizes for each sample. Defaults to the normalized (effective) library sizes in counts if counts is a DGEList or to the columnwise count totals if counts is a matrix.

normalize.method

the microarray-style normalization method to be applied to the logCPM values (if any). Choices are as for the method argument of normalizeBetweenArrays when the data is single-channel. Any normalization factors found in counts will still be used even if normalize.method="none".

span

width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1.

plot

logical, should a plot of the mean-variance trend be displayed?

save.plot

logical, should the coordinates and line of the plot be saved in the output?

Details

This function adapts the limma voom method (Law et al, 2014) to allow for loss of residual degrees of freedom due to exact zero counts (Lun and Smyth, 2017). The loss residual df occurs when all the counts in a group are zero or when there are blocking factors that can fit zero counts exactly. The function transforms the counts to the log2-CPM scale, computes voom precision weights and fits limma linear models. Residual df are computed similarly as far glmQLFit.

The function is analogous to calling voom followed by duplicateCorrelation and lmFit except for the modified residual df values and residual standard deviation sigma values. This function returns df.residual values that are less than or equal to those from lmFit and sigma values that are greater than or equal to those from lmFit. voomLmFit is more robust to zero counts than calling voom, duplicateCorrelation and lmFit separately and provides more rigorous error rate control.

If block is specified, then the intra-block correlation is estimated using duplicateCorrelation In that case, the voom weights and the intra-block correlation are each estimated twice to achieve effective convergence.

Empirical sample quality weights will be estimated if sample.weights=TRUE or if var.design or var.group are non-NULL. In that case, voomLmFit is analogous to running voomWithQualityWeights followed by lmFit.

voomLmFit is usually followed by running eBayes on the fitted model object.

Value

An MArrayLM object containing linear model fits for each row of data. The object includes a targets data.frame component containing sample annotation. Columns of targets include lib.size and sample.weight (if sample.weights=TRUE).

Author(s)

Gordon Smyth

References

Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. See also the Preprint Version at http://www.statsci.org/smyth/pubs/VoomPreprint.pdf incorporating some notational corrections.

Lun, ATL, and Smyth, GK (2017). No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data. Statistical Applications in Genetics and Molecular Biology 16(2), 83-93. https://doi.org/10.1515/sagmb-2017-0010

See Also


edgeR

Empirical Analysis of Digital Gene Expression Data in R

v3.32.1
GPL (>=2)
Authors
Yunshun Chen, Aaron TL Lun, Davis J McCarthy, Matthew E Ritchie, Belinda Phipson, Yifang Hu, Xiaobei Zhou, Mark D Robinson, Gordon K Smyth
Initial release
2021-01-14

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.