Import RDP cluster file and return otu_table (abundance table).
The RDP cluster pipeline (specifically, the output of the complete linkage clustering step)
has no formal documentation for the ".clust"
file or its apparent sequence naming convention.
import_RDP_cluster(RDP_cluster_file)
RDP_cluster_file |
A character string. The name of the |
http://pyro.cme.msu.edu/index.jsp
The cluster file itself contains
the names of all sequences contained in input alignment. If the upstream
barcode and aligment processing steps are also done with the RDP pipeline,
then the sequence names follow a predictable naming convention wherein each
sequence is named by its sample and sequence ID, separated by a "_"
as
delimiter:
"sampleName_sequenceIDnumber"
This import function assumes that the sequence names in the cluster file follow
this convention, and that the sample name does not contain any "_"
. It
is unlikely to work if this is not the case. It is likely to work if you used
the upstream steps in the RDP pipeline to process your raw (barcoded, untrimmed)
fasta/fastq data.
This function first loops through the ".clust"
file and collects all
of the sample names that appear. It secondly loops through each OTU ("cluster"
;
each row of the cluster file) and sums the number of sequences (reads) from
each sample. The resulting abundance table of OTU-by-sample is trivially
coerced to an otu_table
object, and returned.
An otu_table
object parsed from the ".clust"
file.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.