Estimate the dispersions for a DESeqDataSet
This function obtains dispersion estimates for Negative Binomial distributed data.
## S4 method for signature 'DESeqDataSet' estimateDispersions( object, fitType = c("parametric", "local", "mean", "glmGamPoi"), maxit = 100, useCR = TRUE, weightThreshold = 0.01, quiet = FALSE, modelMatrix = NULL, minmu = if (fitType == "glmGamPoi") 1e-06 else 0.5 )
object |
a DESeqDataSet |
fitType |
either "parametric", "local", "mean", or "glmGamPoi" for the type of fitting of dispersions to the mean intensity.
|
maxit |
control parameter: maximum number of iterations to allow for convergence |
useCR |
whether to use Cox-Reid correction - see McCarthy et al (2012) |
weightThreshold |
threshold for subsetting the design matrix and GLM weights for calculating the Cox-Reid correction |
quiet |
whether to print messages at each step |
modelMatrix |
an optional matrix which will be used for fitting the expected counts.
by default, the model matrix is constructed from |
minmu |
lower bound on the estimated count for fitting gene-wise dispersion |
Typically the function is called with the idiom:
dds <- estimateDispersions(dds)
The fitting proceeds as follows: for each gene, an estimate of the dispersion
is found which maximizes the Cox Reid-adjusted profile likelihood
(the methods of Cox Reid-adjusted profile likelihood maximization for
estimation of dispersion in RNA-Seq data were developed by McCarthy,
et al. (2012), first implemented in the edgeR package in 2010);
a trend line capturing the dispersion-mean relationship is fit to the maximum likelihood estimates;
a normal prior is determined for the log dispersion estimates centered
on the predicted value from the trended fit
with variance equal to the difference between the observed variance of the
log dispersion estimates and the expected sampling variance;
finally maximum a posteriori dispersion estimates are returned.
This final dispersion parameter is used in subsequent tests.
The final dispersion estimates can be accessed from an object using dispersions
.
The fitted dispersion-mean relationship is also used in
varianceStabilizingTransformation
.
All of the intermediate values (gene-wise dispersion estimates, fitted dispersion
estimates from the trended fit, etc.) are stored in mcols(dds)
, with
information about these columns in mcols(mcols(dds))
.
The log normal prior on the dispersion parameter has been proposed by Wu, et al. (2012) and is also implemented in the DSS package.
In DESeq2, the dispersion estimation procedure described above replaces the different methods of dispersion from the previous version of the DESeq package.
Since version 1.29, DESeq2 can call the glmGamPoi package, which can speed up the inference
and is optimized for fitting many samles with very small counts (for example single cell
RNA-seq data). To call functions from the glmGamPoi package, make sure that it is installed
and set fitType = "glmGamPoi"
. In addition, to the gene estimates, the trend and the MAP,
the glmGamPoi package calculates the corresponding quasi-likelihood estimates. Those can be
used with the nbinomLRT()
test to get more precise p-value estimates.
The lower-level functions called by estimateDispersions
are:
estimateDispersionsGeneEst
,
estimateDispersionsFit
, and
estimateDispersionsMAP
.
The DESeqDataSet passed as parameters, with the dispersion information
filled in as metadata columns, accessible via mcols
, or the final dispersions
accessible via dispersions
.
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, http://dx.doi.org/10.1186/gb-2010-11-10-r106
McCarthy, DJ, Chen, Y, Smyth, GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40 (2012), 4288-4297, http://dx.doi.org/10.1093/nar/gks042
Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (2012). http://dx.doi.org/10.1093/biostatistics/kxs033
Ahlmann-Eltze, C., Huber, W. glmGamPoi: Fitting Gamma-Poisson Generalized Linear Models on Single Cell Count Data. bioRxiv (2020). https://doi.org/10.1101/2020.08.13.249623
dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) head(dispersions(dds))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.