ape: write.dna – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

write.dna

Write DNA Sequences in a File

Description

These functions write in a file a list of DNA sequences in sequential, interleaved, or FASTA format. write.FASTA can write either DNA or AA sequences.

Usage

write.dna(x, file, format = "interleaved", append = FALSE,
          nbcol = 6, colsep = " ", colw = 10, indent = NULL,
          blocksep = 1)
write.FASTA(x, file, header = NULL, append = FALSE)

Arguments

`x`	a list or a matrix of DNA sequences, or of AA sequences for `write.FASTA`.
`file`	a file name specified by either a variable of mode character, or a double-quoted string.
`format`	a character string specifying the format of the DNA sequences. Three choices are possible: `"interleaved"`, `"sequential"`, or `"fasta"`, or any unambiguous abbreviation of these.
`append`	a logical, if `TRUE` the data are appended to the file without erasing the data possibly existing in the file, otherwise the file (if it exists) is overwritten (`FALSE` the default).
`nbcol`	a numeric specifying the number of columns per row (6 by default); may be negative implying that the nucleotides are printed on a single line.
`colsep`	a character used to separate the columns (a single space by default).
`colw`	a numeric specifying the number of nucleotides per column (10 by default).
`indent`	a numeric or a character specifying how the blocks of nucleotides are indented (see details).
`blocksep`	a numeric specifying the number of lines between the blocks of nucleotides (this has an effect only if 'format = "interleaved"').
`header`	a vector of mode character giving the header to be written in the FASTA file before the sequences. By default, there is no header.

Details

Three formats are supported in the present function: see the help page of read.dna and the references below for a description.

If the sequences have no names, then they are given "1", "2", ... as labels in the file.

With the interleaved and sequential formats, the sequences must be all of the same length. The names of the sequences are not truncated.

The argument indent specifies how the rows of nucleotides are indented. In the interleaved and sequential formats, the rows with the taxon names are never indented; the subsequent rows are indented with 10 spaces by default (i.e., if indent = NULL). In the FASTA format, the rows are not indented by default. This default behaviour can be modified by specifying a value to indent: the rows are then indented with “indent” (if it is a character) or ‘indent’ spaces (if it is a numeric). For example, specifying indent = " " or indent = 3 will have the same effect (use indent = "\t" for a tabulation).

The different options are intended to give flexibility in formatting the sequences. For instance, if the sequences are very long it may be judicious to remove all the spaces beween columns (colsep = ""), in the margins (indent = 0), and between the blocks (blocksep = 0) to produce a smaller file.

write.dna(, format = "fasta") can be very slow if the sequences are long (> 10 kb). write.FASTA is much faster in this situation but the formatting is not flexible: each sequence is printed on a single line, which is OK for big files that are not intended to be open with a text editor.

Value

None (invisible ‘NULL’).

Note

Specifying a negative value for ‘nbcol’ (meaning that the nucleotides are printed on a single line) gives the same output for the interleaved and sequential formats.

The names of the sequences can be truncated with the function makeLabel. In particular, Clustal is limited to 30 characters, and PHYML seems limited to 99 characters.

Author(s)

Emmanuel Paradis

References

Anonymous. FASTA format. https://en.wikipedia.org/wiki/FASTA_format

Felsenstein, J. (1993) Phylip (Phylogeny Inference Package) version 3.5c. Department of Genetics, University of Washington. http://evolution.genetics.washington.edu/phylip/phylip.html

Emmanuel Paradis [aut, cre, cph] (<https://orcid.org/0000-0003-3092-2199>), Simon Blomberg [aut, cph] (<https://orcid.org/0000-0003-1062-0839>), Ben Bolker [aut, cph] (<https://orcid.org/0000-0002-2127-0443>), Joseph Brown [aut, cph] (<https://orcid.org/0000-0002-3835-8062>), Santiago Claramunt [aut, cph] (<https://orcid.org/0000-0002-8926-5974>), Julien Claude [aut, cph] (<https://orcid.org/0000-0002-9267-1228>), Hoa Sien Cuong [aut, cph], Richard Desper [aut, cph], Gilles Didier [aut, cph] (<https://orcid.org/0000-0003-0596-9112>), Benoit Durand [aut, cph], Julien Dutheil [aut, cph] (<https://orcid.org/0000-0001-7753-4121>), RJ Ewing [aut, cph], Olivier Gascuel [aut, cph], Thomas Guillerme [aut, cph] (<https://orcid.org/0000-0003-4325-1275>), Christoph Heibl [aut, cph] (<https://orcid.org/0000-0002-7655-3299>), Anthony Ives [aut, cph] (<https://orcid.org/0000-0001-9375-9523>), Bradley Jones [aut, cph] (<https://orcid.org/0000-0003-4498-1069>), Franz Krah [aut, cph] (<https://orcid.org/0000-0001-7866-7508>), Daniel Lawson [aut, cph] (<https://orcid.org/0000-0002-5311-6213>), Vincent Lefort [aut, cph], Pierre Legendre [aut, cph] (<https://orcid.org/0000-0002-3838-3305>), Jim Lemon [aut, cph], Guillaume Louvel [aut, cph] (<https://orcid.org/0000-0002-7745-0785>), Eric Marcon [aut, cph] (<https://orcid.org/0000-0002-5249-321X>), Rosemary McCloskey [aut, cph] (<https://orcid.org/0000-0002-9772-8553>), Johan Nylander [aut, cph], Rainer Opgen-Rhein [aut, cph], Andrei-Alin Popescu [aut, cph], Manuela Royer-Carenzi [aut, cph], Klaus Schliep [aut, cph] (<https://orcid.org/0000-0003-2941-0161>), Korbinian Strimmer [aut, cph] (<https://orcid.org/0000-0001-7917-2056>), Damien de Vienne [aut, cph] (<https://orcid.org/0000-0001-9532-5251>)

Initial release

2021-04-24