FPKM: fragments per kilobase per million mapped fragments
The following function returns fragment counts normalized
per kilobase of feature length per million mapped fragments
(by default using a robust estimate of the library size,
as in estimateSizeFactors
).
fpkm(object, robust = TRUE)
object |
a |
robust |
whether to use size factors to normalize
rather than taking the column sums of the raw counts,
using the |
The length of the features (e.g. genes) is calculated one of two ways:
(1) If there is a matrix named "avgTxLength" in assays(dds)
,
this will take precedence in the length normalization.
This occurs when using the tximport-DESeq2 pipeline.
(2) Otherwise, feature length is calculated
from the rowRanges
of the dds object,
if a column basepairs
is not present in mcols(dds)
.
The calculated length is the number of basepairs in the union of all GRanges
assigned to a given row of object
, e.g.,
the union of all basepairs of exons of a given gene.
Note that the second approach over-estimates the gene length
(average transcript length, weighted by abundance is a more appropriate
normalization for gene counts), and so the FPKM will be an underestimate of the true value.
Note that, when the read/fragment counting has inter-feature dependencies, a strict normalization would not incorporate the basepairs of a feature which overlap another feature. This inter-feature dependence is not taken into consideration in the internal union basepair calculation.
a matrix which is normalized per kilobase of the
union of basepairs in the GRangesList
or GRanges
of the mcols(object), and per million of mapped fragments,
either using the robust median ratio method (robust=TRUE, default)
or using raw counts (robust=FALSE).
Defining a column mcols(object)$basepairs
takes
precedence over internal calculation of the kilobases for each row.
# create a matrix with 1 million counts for the # 2nd and 3rd column, the 1st and 4th have # half and double the counts, respectively. m <- matrix(1e6 * rep(c(.125, .25, .25, .5), each=4), ncol=4, dimnames=list(1:4,1:4)) mode(m) <- "integer" se <- SummarizedExperiment(list(counts=m), colData=DataFrame(sample=1:4)) dds <- DESeqDataSet(se, ~ 1) # create 4 GRanges with lengths: 1, 1, 2, 2.5 Kb gr1 <- GRanges("chr1",IRanges(1,1000)) # 1kb gr2 <- GRanges("chr1",IRanges(c(1,1001),c( 500,1500))) # 1kb gr3 <- GRanges("chr1",IRanges(c(1,1001),c(1000,2000))) # 2kb gr4 <- GRanges("chr1",IRanges(c(1,1001),c(200,1300))) # 500bp rowRanges(dds) <- GRangesList(gr1,gr2,gr3,gr4) # the raw counts counts(dds) # the FPM values fpm(dds) # the FPKM values fpkm(dds)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.