Resample an OTU table such that all samples have the same library size.
Please note that the authors of phyloseq do not advocate using this
as a normalization procedure, despite its recent popularity.
Our justifications for using alternative approaches to address
disparities in library sizes have been made available as
an article in PLoS Computational Biology.
See phyloseq_to_deseq2
for a recommended alternative to rarefying
directly supported in the phyloseq package, as well as
the supplemental materials for the PLoS-CB article
and the phyloseq extensions repository on GitHub.
Nevertheless, for comparison and demonstration, the rarefying procedure is implemented
here in good faith and with options we hope are useful.
This function uses the standard R sample
function to
resample from the abundance values
in the otu_table
component of the first argument,
physeq
.
Often one of the major goals of this procedure is to achieve parity in
total number of counts between samples, as an alternative to other formal
normalization procedures, which is why a single value for the
sample.size
is expected.
This kind of resampling can be performed with and without replacement,
with replacement being the more computationally-efficient, default setting.
See the replace
parameter documentation for more details.
We recommended that you explicitly select a random number generator seed
before invoking this function, or, alternatively, that you
explicitly provide a single positive integer argument as rngseed
.
rarefy_even_depth(physeq, sample.size = min(sample_sums(physeq)), rngseed = FALSE, replace = TRUE, trimOTUs = TRUE, verbose = TRUE)
physeq |
(Required). A |
sample.size |
(Optional). A single integer value equal to the number
of reads being simulated, also known as the depth,
and also equal to each value returned by |
rngseed |
(Optional). A single integer value passed to
|
replace |
(Optional). Logical. Whether to sample with replacement
( |
trimOTUs |
(Optional). |
verbose |
(Optional). Logical. Default is |
This approach is sometimes mistakenly called “rarefaction”, which
in physics refers to a form of wave decompression;
but in this context, ecology, the term refers to a
repeated sampling procedure to assess species richness,
first proposed in 1968 by Howard Sanders.
In contrast, the procedure implemented here is used as an ad hoc means to
normalize microbiome counts that have
resulted from libraries of widely-differing sizes.
Here we have intentionally adopted an alternative
name, rarefy
, that has also been used recently
to describe this process
and, to our knowledge, not previously used in ecology.
Make sure to use set.seed
for exactly-reproducible results
of the random subsampling.
An object of class phyloseq
.
Only the otu_table
component is modified.
# Test with esophagus dataset data("esophagus") esorepT = rarefy_even_depth(esophagus, replace=TRUE) esorepF = rarefy_even_depth(esophagus, replace=FALSE) sample_sums(esophagus) sample_sums(esorepT) sample_sums(esorepF) ## NRun Manually: Too slow! # data("GlobalPatterns") # GPrepT = rarefy_even_depth(GlobalPatterns, 1E5, replace=TRUE) ## Actually just this one is slow # system.time(GPrepF <- rarefy_even_depth(GlobalPatterns, 1E5, replace=FALSE))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.