Test for Differential Exon Usage
Given a negative binomial generalized log-linear model fit at the exon level, test for differential exon usage between experimental conditions.
diffSpliceDGE(glmfit, coef=ncol(glmfit$design), contrast=NULL, geneid, exonid=NULL, prior.count=0.125, verbose=TRUE)
glmfit |
an |
coef |
integer indicating which coefficient of the generalized linear model is to be tested for differential exon usage. Defaults to the last coefficient. |
contrast |
numeric vector specifying the contrast of the linear model coefficients to be tested for differential exon usage. Length must equal to the number of columns of |
geneid |
gene identifiers. Either a vector of length |
exonid |
exon identifiers. Either a vector of length |
prior.count |
average prior count to be added to observation to shrink the estimated log-fold-changes towards zero. |
verbose |
logical, if |
This function tests for differential exon usage for each gene for a given coefficient of the generalized linear model.
Testing for differential exon usage is equivalent to testing whether the exons in each gene have the same log-fold-changes as the other exons in the same gene. At exon-level, the log-fold-change of each exon is compared to the log-fold-change of the entire gene which contains that exon. At gene-level, two different tests are provided. One is converting exon-level p-values to gene-level p-values by the Simes method. The other is using exon-level test statistics to conduct gene-level tests.
diffSpliceDGE
produces an object of class DGELRT
containing the component design
from glmfit
plus the following new components:
comparison |
character string describing the coefficient being tested. |
coefficients |
numeric vector of coefficients on the natural log scale. Each coefficient is the difference between the log-fold-change for that exon versus the log-fold-change for the entire gene which contains that exon. |
genes |
data.frame of exon annotation. |
genecolname |
character string giving the name of the column of |
exoncolname |
character string giving the name of the column of |
exon.df.test |
numeric vector of testing degrees of freedom for exons. |
exon.p.value |
numeric vector of p-values for exons. |
gene.df.test |
numeric vector of testing degrees of freedom for genes. |
gene.p.value |
numeric vector of gene-level testing p-values. |
gene.Simes.p.value |
numeric vector of Simes' p-values for genes. |
gene.genes |
data.frame of gene annotation. |
Some components of the output depend on whether glmfit
is produced by glmFit
or glmQLFit
.
If glmfit
is produced by glmFit
, then the following components are returned in the output object:
exon.LR |
numeric vector of LR-statistics for exons. |
gene.LR |
numeric vector of LR-statistics for gene-level test. |
If glmfit
is produced by glmQLFit
, then the following components are returned in the output object:
exon.F |
numeric vector of F-statistics for exons. |
gene.df.prior |
numeric vector of prior degrees of freedom for genes. |
gene.df.residual |
numeric vector of residual degrees of freedom for genes. |
gene.F |
numeric vector of F-statistics for gene-level test. |
The information and testing results for both exons and genes are sorted by geneid and by exonid within gene.
Yunshun Chen and Gordon Smyth
# Gene exon annotation Gene <- paste("Gene", 1:100, sep="") Gene <- rep(Gene, each=10) Exon <- paste("Ex", 1:10, sep="") Gene.Exon <- paste(Gene, Exon, sep=".") genes <- data.frame(GeneID=Gene, Gene.Exon=Gene.Exon) group <- factor(rep(1:2, each=3)) design <- model.matrix(~group) mu <- matrix(100, nrow=1000, ncol=6) # knock-out the first exon of Gene1 by 90% mu[1,4:6] <- 10 # generate exon counts counts <- matrix(rnbinom(6000,mu=mu,size=20),1000,6) y <- DGEList(counts=counts, lib.size=rep(1e6,6), genes=genes) gfit <- glmFit(y, design, dispersion=0.05) ds <- diffSpliceDGE(gfit, geneid="GeneID") topSpliceDGE(ds) plotSpliceDGE(ds)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.