Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

estimateSizeFactors

Estimate the size factors for a DESeqDataSet


Description

This function estimates the size factors using the "median ratio method" described by Equation 5 in Anders and Huber (2010). The estimated size factors can be accessed using the accessor function sizeFactors. Alternative library size estimators can also be supplied using the assignment function sizeFactors<-.

Usage

## S4 method for signature 'DESeqDataSet'
estimateSizeFactors(
  object,
  type = c("ratio", "poscounts", "iterate"),
  locfunc = stats::median,
  geoMeans,
  controlGenes,
  normMatrix,
  quiet = FALSE
)

Arguments

object

a DESeqDataSet

type

Method for estimation: either "ratio", "poscounts", or "iterate". "ratio" uses the standard median ratio method introduced in DESeq. The size factor is the median ratio of the sample over a "pseudosample": for each gene, the geometric mean of all samples. "poscounts" and "iterate" offer alternative estimators, which can be used even when all genes contain a sample with a zero (a problem for the default method, as the geometric mean becomes zero, and the ratio undefined). The "poscounts" estimator deals with a gene with some zeros, by calculating a modified geometric mean by taking the n-th root of the product of the non-zero counts. This evolved out of use cases with Paul McMurdie's phyloseq package for metagenomic samples. The "iterate" estimator iterates between estimating the dispersion with a design of ~1, and finding a size factor vector by numerically optimizing the likelihood of the ~1 model.

locfunc

a function to compute a location for a sample. By default, the median is used. However, especially for low counts, the shorth function from the genefilter package may give better results.

geoMeans

by default this is not provided and the geometric means of the counts are calculated within the function. A vector of geometric means from another count matrix can be provided for a "frozen" size factor calculation. The size factors will be scaled to have a geometric mean of 1 when supplying geoMeans.

controlGenes

optional, numeric or logical index vector specifying those genes to use for size factor estimation (e.g. housekeeping or spike-in genes)

normMatrix

optional, a matrix of normalization factors which do not yet control for library size. Note that this argument should not be used (and will be ignored) if the dds object was created using tximport. In this case, the information in assays(dds)[["avgTxLength"]] is automatically used to create appropriate normalization factors. Providing normMatrix will estimate size factors on the count matrix divided by normMatrix and store the product of the size factors and normMatrix as normalizationFactors. It is recommended to divide out the row-wise geometric mean of normMatrix so the rows roughly are centered on 1.

quiet

whether to print messages

Details

Typically, the function is called with the idiom:

dds <- estimateSizeFactors(dds)

See DESeq for a description of the use of size factors in the GLM. One should call this function after DESeqDataSet unless size factors are manually specified with sizeFactors. Alternatively, gene-specific normalization factors for each sample can be provided using normalizationFactors which will always preempt sizeFactors in calculations.

Internally, the function calls estimateSizeFactorsForMatrix, which provides more details on the calculation.

Value

The DESeqDataSet passed as parameters, with the size factors filled in.

Author(s)

Simon Anders

References

Reference for the median ratio method:

Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 2010, 11:106. http://dx.doi.org/10.1186/gb-2010-11-10-r106

See Also

Examples

dds <- makeExampleDESeqDataSet(n=1000, m=4)
dds <- estimateSizeFactors(dds)
sizeFactors(dds)

dds <- estimateSizeFactors(dds, controlGenes=1:200)

m <- matrix(runif(1000 * 4, .5, 1.5), ncol=4)
dds <- estimateSizeFactors(dds, normMatrix=m)
normalizationFactors(dds)[1:3,]

geoMeans <- exp(rowMeans(log(counts(dds))))
dds <- estimateSizeFactors(dds,geoMeans=geoMeans)
sizeFactors(dds)

DESeq2

Differential gene expression analysis based on the negative binomial distribution

v1.30.1
LGPL (>= 3)
Authors
Michael Love [aut, cre], Constantin Ahlmann-Eltze [ctb], Kwame Forbes [ctb], Simon Anders [aut, ctb], Wolfgang Huber [aut, ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.