phyloseq: import_usearch_uc – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

import_usearch_uc

Import usearch table format (.uc) to OTU table

Description

UPARSE is an algorithm for OTU-clustering implemented within usearch. At last check, the UPARSE algortihm was accessed via the -cluster_otu option flag. For details about installing and running usearch, please refer to the usearch website. For details about the output format, please refer to the uc format definition. This importer is intended to read a particular table format output that is generated by usearch, its so-called “cluster format”, a file format that is often given the .uc extension in usearch documentation.

Usage

import_usearch_uc(ucfile, colRead = 9, colOTU = 10,
  readDelimiter = "_", verbose = TRUE)

Arguments

`ucfile`	(Required). A file location character string or `connection` corresponding to the file that contains the usearch output table. This is passed directly to `read.table`. Please see its `file` argument documentation for further links and details.
`colRead`	(Optional). Numeric. The column index in the uc-table file that holds the read IDs. The default column index is `9`.
`colOTU`	(Optional). Numeric. The column index in the uc-table file that holds OTU IDs. The default column index is `10`.
`readDelimiter`	(Optional). An R `regex` as a character string. This should be the delimiter that separates the sample ID from the original ID in the demultiplexed read ID of your sequence file. The default is plain underscore, which in this `regex` context is `"_"`.
`verbose`	(Optional). A `logical`. Default is `TRUE`. Should progresss messages be `cat`ted to standard out?

Details

Because usearch is an external (non-R) application, there is no direct way to continuously check that these suggested arguments and file formats will remain in their current state. If there is a problem, please verify your version of usearch, create a small reproducible example of the problem, and post it as an issue on the phyloseq issues tracker. The version of usearch upon which this import function was created is 7.0.109. Hopefully later versions of usearch maintain this function and format, but the phyloseq team has no way to guarantee this, and so any feedback about this will help maintain future functionality. For instance, it is currently assumed that the 9th and 10th columns of the .uc table hold the read-label and OTU ID, respectively; and it is also assumed that the delimiter between sample-name and read in the read-name entries is a single "_". If this is not true, you may have to update these parameters, or even modify the current implementation of this function.

Also note that there is now a UPARSE-specific output file format, uparseout, and it might make more sense to create and import that file for use in phyloseq. If so, you'll want to import using the import_uparse() function.

Examples

usearchfile <- system.file("extdata", "usearch.uc", package="phyloseq")
import_usearch_uc(usearchfile)

phyloseq

Handling and analysis of high-throughput microbiome census data

v1.34.0

AGPL-3

Authors

Paul J. McMurdie <joey711@gmail.com>, Susan Holmes <susan@stat.stanford.edu>, with contributions from Gregory Jordan and Scott Chamberlain

Initial release

2019-04-23