Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

glmQLFTest

Genewise Negative Binomial Generalized Linear Models with Quasi-likelihood Tests


Description

Fit a quasi-likelihood negative binomial generalized log-linear model to count data. Conduct genewise statistical tests for a given coefficient or contrast.

Usage

## S3 method for class 'DGEList'
glmQLFit(y, design=NULL, dispersion=NULL, abundance.trend=TRUE,
        robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...)
## S3 method for class 'SummarizedExperiment'
glmQLFit(y, design=NULL, dispersion=NULL, abundance.trend=TRUE,
        robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...)
## Default S3 method:
glmQLFit(y, design=NULL, dispersion=NULL, offset=NULL, lib.size=NULL,
        weights=NULL, abundance.trend=TRUE, AveLogCPM=NULL, robust=FALSE,
        winsor.tail.p=c(0.05, 0.1), ...)
glmQLFTest(glmfit, coef=ncol(glmfit$design), contrast=NULL, poisson.bound=TRUE)

Arguments

y

a matrix of counts, or a DGEList object with (at least) elements counts (table of unadjusted counts) and samples (data frame containing information about experimental group, library size and normalization factor for the library size), or a SummarizedExperiment object with (at least) an element counts in its assays.

design

numeric matrix giving the design matrix for the genewise linear models.

dispersion

numeric scalar, vector or matrix of negative binomial dispersions. If NULL, then will be extracted from the DGEList object y, with order of precedence: trended dispersions, common dispersion, a constant value of 0.05.

offset

numeric matrix of same size as y giving offsets for the log-linear models. Can be a scalor or a vector of length ncol(y), in which case it is expanded out to a matrix. If NULL will be computed by getOffset(y).

lib.size

numeric vector of length ncol(y) giving library sizes. Only used if offset=NULL, in which case offset is set to log(lib.size). Defaults to colSums(y).

weights

numeric matrix of same size as y giving weights for the log-linear models. If NULL, will be set to unity for all observations.

abundance.trend

logical, whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter.

AveLogCPM

average log2-counts per million, the average taken over all libraries in y. If NULL will be computed by aveLogCPM(y).

robust

logical, whether to estimate the prior QL dispersion distribution robustly.

winsor.tail.p

numeric vector of length 2 giving proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when robust=TRUE.

...

other arguments are passed to glmFit.

glmfit

a DGEGLM object, usually output from glmQLFit.

coef

integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. Ignored if contrast is not NULL.

contrast

numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero.

poisson.bound

logical, if TRUE then the p-value returned will never be less than would be obtained for a likelihood ratio test with NB dispersion equal to zero.

Details

glmQLFit and glmQLFTest implement the quasi-likelihood (QL) methods of Lund et al (2012), with some enhancements and with slightly different glm, trend and FDR methods. See Lun et al (2016) or Chen et al (2016) for tutorials describing the use of glmQLFit and glmQLFit as part of a complete analysis pipeline. Another case study using glmQLFit and glmQLFTest is given in Section 4.7 of the edgeR User's Guide.

glmQLFit is similar to glmFit except that it also estimates QL dispersion values. It calls the limma function squeezeVar to conduct empirical Bayes moderation of the genewise QL dispersions. If robust=TRUE, then the robust hyperparameter estimation features of squeezeVar are used (Phipson et al, 2013). If abundance.trend=TRUE, then a prior trend is estimated based on the average logCPMs.

glmQLFit gives special attention to handling of zero counts, and in particular to situations when fitted values of zero provide no useful residual degrees of freedom for estimating the QL dispersion (Lun and Smyth, 2017). The usual residual degrees of freedom are returned as df.residual while the adjusted residual degrees of freedom are returned as df.residuals.zeros.

glmQLFTest is similar to glmLRT except that it replaces likelihood ratio tests with empirical Bayes quasi-likelihood F-tests. The p-values from glmQLFTest are always greater than or equal to those that would be obtained from glmLRT using the same negative binomial dispersions.

Value

glmQLFit produces an object of class DGEGLM with the same components as produced by glmFit, plus:

df.residual.zeros

a numeric vector containing the number of effective residual degrees of freedom for each gene, taking into account any treatment groups with all zero counts.

df.prior

a numeric vector or scalar, giving the prior degrees of freedom for the QL dispersions.

var.prior

a numeric vector of scalar, giving the location of the prior distribution for the QL dispersions.

var.post

a numeric vector containing the posterior empirical Bayes QL dispersions.

df.prior is a vector of length nrow(y) if robust=TRUE, otherwise it has length 1. var.prior is a vector of length nrow(y) if abundance.trend=TRUE, otherwise it has length 1.

glmQFTest produce an object of class DGELRT with the same components as produced by glmLRT, except that the table$LR column becomes table$F and contains quasi-likelihood F-statistics. It also stores df.total, a numeric vector containing the denominator degrees of freedom for the F-test, equal to df.prior + df.residual.zeros.

Note

The negative binomial dispersions dispersion supplied to glmQLFit and glmQLFTest must be based on a global model, that is, they must be either trended or common dispersions. It is not correct to supply genewise dispersions because glmQLFTest estimates genewise variability using the QL dispersion.

Author(s)

Yunshun Chen, Aaron Lun, Davis McCarthy and Gordon Smyth

References

Chen Y, Lun ATL, and Smyth, GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438. http://f1000research.com/articles/5-1438

Lun, ATL, Chen, Y, and Smyth, GK (2016). It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods in Molecular Biology 1418, 391-416. http://www.statsci.org/smyth/pubs/QLedgeRPreprint.pdf (Preprint 8 April 2015)

Lund, SP, Nettleton, D, McCarthy, DJ, and Smyth, GK (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 5, Article 8. http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf

Lun, ATL, and Smyth, GK (2017). No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data. Statistical Applications in Genetics and Molecular Biology 16(2), 83-93. https://doi.org/10.1515/sagmb-2017-0010

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. https://projecteuclid.org/euclid.aoas/1469199900

See Also

topTags displays results from glmQLFTest.

plotQLDisp can be used to visualize the distribution of QL dispersions after EB shrinkage from glmQLFit.

The QuasiSeq package gives an alternative implementation of the Lund et al (2012) methods.

Examples

nlibs <- 4
ngenes <- 1000
dispersion.true <- 1/rchisq(ngenes, df=10)
design <- model.matrix(~factor(c(1,1,2,2)))

# Generate count data
y <- rnbinom(ngenes*nlibs,mu=20,size=1/dispersion.true)
y <- matrix(y,ngenes,nlibs)
d <- DGEList(y)
d <- calcNormFactors(d)

# Fit the NB GLMs with QL methods
d <- estimateDisp(d, design)
fit <- glmQLFit(d, design)
results <- glmQLFTest(fit)
topTags(results)
fit <- glmQLFit(d, design, robust=TRUE)
results <- glmQLFTest(fit)
topTags(results)
fit <- glmQLFit(d, design, abundance.trend=FALSE)
results <- glmQLFTest(fit)
topTags(results)

edgeR

Empirical Analysis of Digital Gene Expression Data in R

v3.32.1
GPL (>=2)
Authors
Yunshun Chen, Aaron TL Lun, Davis J McCarthy, Matthew E Ritchie, Belinda Phipson, Yifang Hu, Xiaobei Zhou, Mark D Robinson, Gordon K Smyth
Initial release
2021-01-14

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.