Estimate the size factors for a DESeqDataSet
This function estimates the size factors using the
"median ratio method" described by Equation 5 in Anders and Huber (2010).
The estimated size factors can be accessed using the accessor function sizeFactors
.
Alternative library size estimators can also be supplied
using the assignment function sizeFactors<-
.
## S4 method for signature 'DESeqDataSet' estimateSizeFactors( object, type = c("ratio", "poscounts", "iterate"), locfunc = stats::median, geoMeans, controlGenes, normMatrix, quiet = FALSE )
object |
a DESeqDataSet |
type |
Method for estimation: either "ratio", "poscounts", or "iterate". "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the median ratio of the sample over a "pseudosample": for each gene, the geometric mean of all samples. "poscounts" and "iterate" offer alternative estimators, which can be used even when all genes contain a sample with a zero (a problem for the default method, as the geometric mean becomes zero, and the ratio undefined). The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts. This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples. The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and finding a size factor vector by numerically optimizing the likelihood of the ~1 model. |
locfunc |
a function to compute a location for a sample. By default, the
median is used. However, especially for low counts, the
|
geoMeans |
by default this is not provided and the
geometric means of the counts are calculated within the function.
A vector of geometric means from another count matrix can be provided
for a "frozen" size factor calculation. The size factors will be
scaled to have a geometric mean of 1 when supplying |
controlGenes |
optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes) |
normMatrix |
optional, a matrix of normalization factors which do not yet
control for library size. Note that this argument should not be used (and
will be ignored) if the |
quiet |
whether to print messages |
Typically, the function is called with the idiom:
dds <- estimateSizeFactors(dds)
See DESeq
for a description of the use of size factors in the GLM.
One should call this function after DESeqDataSet
unless size factors are manually specified with sizeFactors
.
Alternatively, gene-specific normalization factors for each sample can be provided using
normalizationFactors
which will always preempt sizeFactors
in calculations.
Internally, the function calls estimateSizeFactorsForMatrix
,
which provides more details on the calculation.
The DESeqDataSet passed as parameters, with the size factors filled in.
Simon Anders
Reference for the median ratio method:
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106
dds <- makeExampleDESeqDataSet(n=1000, m=4) dds <- estimateSizeFactors(dds) sizeFactors(dds) dds <- estimateSizeFactors(dds, controlGenes=1:200) m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4) dds <- estimateSizeFactors(dds, normMatrix=m) normalizationFactors(dds)[1:3,] geoMeans <- exp(rowMeans(log(counts(dds)))) dds <- estimateSizeFactors(dds,geoMeans=geoMeans) sizeFactors(dds)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.