Calculate weighted or unweighted (Fast) UniFrac distance for all sample pairs.
This function calculates the (Fast) UniFrac distance for all sample-pairs
in a phyloseq-class
object.
UniFrac(physeq, weighted=FALSE, normalized=TRUE, parallel=FALSE, fast=TRUE) ## S4 method for signature 'phyloseq' UniFrac(physeq, weighted = FALSE, normalized = TRUE, parallel = FALSE, fast = TRUE)
physeq |
(Required). |
weighted |
(Optional). Logical. Should use weighted-UniFrac calculation?
Weighted-UniFrac takes into account the relative abundance of species/taxa
shared between samples, whereas unweighted-UniFrac only considers
presence/absence. Default is |
normalized |
(Optional). Logical. Should the output be normalized such that values
range from 0 to 1 independent of branch length values? Default is |
parallel |
(Optional). Logical. Should execute calculation in parallel,
using multiple CPU cores simultaneously? This can dramatically hasten the
computation time for this function. However, it also requires that the user
has registered a parallel “backend” prior to calling this function.
Default is |
fast |
(Optional). Logical. DEPRECATED.
Do you want to use the “Fast UniFrac”
algorithm? Implemented natively in the |
UniFrac()
accesses the abundance
(otu_table-class
) and a phylogenetic tree (phylo-class
)
data within an experiment-level (phyloseq-class
) object.
If the tree and contingency table are separate objects, suggested solution
is to combine them into an experiment-level class
using the phyloseq
function. For example, the following code
phyloseq(myotu_table, myTree)
returns a phyloseq
-class object that has been pruned and comprises
the minimum arguments necessary for UniFrac()
.
Parallelization is possible for UniFrac calculated with the phyloseq-package
,
and is encouraged in the instances of large trees, many samples, or both.
Parallelization has been implemented via the foreach-package
.
This means that parallel calls need to be preceded by 2 or more commands
that register the parallel “backend”. This is acheived via your choice of
helper packages. One of the simplest seems to be the doParallel package.
For more information, see the following links on registering the “backend”:
foreach package manual:
Notes on parallel computing in R
. Skip to the section describing
the foreach Framework. It gives off-the-shelf examples for registering
a parallel backend using the doMC, doSNOW, or doMPI packages:
Furthermore, as of R
version 2.14.0
and higher, a parallel package
is included as part of the core installation, parallel-package
,
and this can be used as the parallel backend with the foreach-package
using the adaptor package “doParallel”.
http://cran.r-project.org/web/packages/doParallel/index.html
See the vignette for some simple examples for using doParallel. http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf
UniFrac-specific examples for doParallel are provided in the example code below.
a sample-by-sample distance matrix, suitable for NMDS, etc.
The main implementation (Fast UniFrac) is adapted from the algorithm's description in:
Hamady, Lozupone, and Knight, “Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data.” The ISME Journal (2010) 4, 17–27.
See also additional descriptions of UniFrac in the following articles:
Lozupone, Hamady and Knight, “UniFrac - An Online Tool for Comparing Microbial Community Diversity in a Phylogenetic Context.”, BMC Bioinformatics 2006, 7:371
Lozupone, Hamady, Kelley and Knight, “Quantitative and qualitative (beta) diversity measures lead to different insights into factors that structure microbial communities.” Appl Environ Microbiol. 2007
Lozupone C, Knight R. “UniFrac: a new phylogenetic method for comparing microbial communities.” Appl Environ Microbiol. 2005 71 (12):8228-35.
unifrac
in the picante package.
################################################################################ # Perform UniFrac on esophagus data ################################################################################ data("esophagus") (y <- UniFrac(esophagus, TRUE)) UniFrac(esophagus, TRUE, FALSE) UniFrac(esophagus, FALSE) # ################################################################################ # # Now try a parallel implementation using doParallel, which leverages the # # new 'parallel' core package in R 2.14.0+ # # Note that simply loading the 'doParallel' package is not enough, you must # # call a function that registers the backend. In general, this is pretty easy # # with the 'doParallel package' (or one of the alternative 'do*' packages) # # # # Also note that the esophagus example has only 3 samples, and a relatively small # # tree. This is fast to calculate even sequentially and does not warrant # # parallelized computation, but provides a good quick example for using UniFrac() # # in a parallel fashion. The number of cores you should specify during the # # backend registration, using registerDoParallel(), depends on your system and # # needs. 3 is chosen here for convenience. If your system has only 2 cores, this # # will probably fault or run slower than necessary. # ################################################################################ # library(doParallel) # data(esophagus) # # For SNOW-like functionality (works on Windows): # cl <- makeCluster(3) # registerDoParallel(cl) # UniFrac(esophagus, TRUE) # # Force to sequential backed: # registerDoSEQ() # # For multicore-like functionality (will probably not work on windows), # # register the backend like this: # registerDoParallel(cores=3) # UniFrac(esophagus, TRUE) ################################################################################
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.