Read Character Data In NEXUS Format
read.nexus.data
reads a file with sequences in the NEXUS
format. nexus2DNAbin
is a helper function to convert the output
from the previous function into the class "DNAbin"
.
For the moment, only sequence data (DNA or protein) are supported.
read.nexus.data(file) nexus2DNAbin(x)
file |
a file name specified by either a variable of mode character, or a double-quoted string. |
x |
an object output by |
This parser tries to read data from a file written in a restricted NEXUS format (see examples below).
Please see files ‘data.nex’ and ‘taxacharacters.nex’ for examples of formats that will work.
Some noticeable exceptions from the NEXUS standard (non-exhaustive list):
IComments must be either on separate lines or at the
end of lines. Examples:[Comment]
— OKTaxon ACGTACG [Comment]
— OK[Comment line 1
Comment line 2]
— NOT OK!Tax[Comment]on ACG[Comment]T
— NOT OK!
IINo spaces (or comments) are allowed in the
sequences. Examples:name ACGT
— OKname AC GT
— NOT OK!
IIINo spaces are allowed in taxon names, not even if
names are in single quotes. That is, single-quoted names are not
treated as such by the parser. Examples:Genus_species
— OK'Genus_species'
— OK'Genus species'
— NOT OK!
IVThe trailing end
that closes the
matrix
must be on a separate line. Examples:taxon AACCGGT
end;
— OKtaxon AACCGGT;
end;
— OKtaxon AACCCGT; end;
— NOT OK!
VMultistate characters are not allowed. That is,
NEXUS allows you to specify multiple character states at a
character position either as an uncertainty, (XY)
, or as an
actual appearance of multiple states, {XY}
. This is
information is not handled by the parser. Examples:taxon 0011?110
— OKtaxon 0011{01}110
— NOT OK!taxon 0011(01)110
— NOT OK!
VIThe number of taxa must be on the same line as
ntax
. The same applies to nchar
. Examples:ntax = 12
— OKntax =
12
— NOT OK!
VIIThe word “matrix” can not occur anywhere in
the file before the actual matrix
command, unless it is in
a comment. Examples:BEGIN CHARACTERS;
TITLE 'Data in file "03a-cytochromeB.nex"';
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
["This is The Matrix"]
— OK
MATRIX
BEGIN CHARACTERS;
TITLE 'Matrix in file "03a-cytochromeB.nex"';
— NOT OK!
DIMENSIONS NCHAR=382;
FORMAT DATATYPE=Protein GAP=- MISSING=?;
MATRIX
A list of sequences each made of a single vector of mode character where each element is a (phylogenetic) character state.
Johan Nylander, Thomas Guillerme, and Klaus Schliep
Maddison, D. R., Swofford, D. L. and Maddison, W. P. (1997) NEXUS: an extensible file format for systematic information. Systematic Biology, 46, 590–621.
## Use read.nexus.data to read a file in NEXUS format into object x ## Not run: x <- read.nexus.data("file.nex")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.