Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

read.GenBank

Read DNA Sequences from GenBank via Internet


Description

This function connects to the GenBank database, and reads nucleotide sequences using accession numbers given as arguments.

Usage

read.GenBank(access.nb, seq.names = access.nb, species.names = TRUE,
             as.character = FALSE, chunk.size = 400, quiet = TRUE)

Arguments

access.nb

a vector of mode character giving the accession numbers.

seq.names

the names to give to each sequence; by default the accession numbers are used.

species.names

a logical indicating whether to attribute the species names to the returned object.

as.character

a logical controlling whether to return the sequences as an object of class "DNAbin" (the default).

chunk.size

the number of sequences downloaded together (see details).

quiet

a logical value indicating whether to show the progress of the downloads. If TRUE, will also print the (full) name of the FASTA file containing the downloaded sequences.

Details

The function uses the site https://www.ncbi.nlm.nih.gov/ from where the sequences are retrieved.

If species.names = TRUE, the returned list has an attribute "species" containing the names of the species taken from the field “ORGANISM” in GenBank.

Since ape 3.6, this function retrieves the sequences in FASTA format: this is more efficient and more flexible (scaffolds and contigs can be read) than what was done in previous versions. The option gene.names has been removed in ape 5.4; this information is also present in the description.

Setting species.names = FALSE is much faster (could be useful if you read a series of scaffolds or contigs, or if you already have the species names).

The argument chunk.size is set by default to 400 which is likely to work in many cases. If an error occurs such as “Cannot open file ...” showing the list of the accession numbers, then you may try decreasing chunk.size to 200 or 300.

If quiet = FALSE, the display is done chunk by chunk, so the message “Downloading sequences: 400 / 400 ...” means that the download from sequence 1 to sequence 400 is under progress (it is not possible to display a more accurate message because the download method depends on the platform).

Value

A list of DNA sequences made of vectors of class "DNAbin", or of single characters (if as.character = TRUE) with two attributes (species and description).

Author(s)

Emmanuel Paradis

See Also

Examples

## This won't work if your computer is not connected
## to the Internet

## Get the 8 sequences of tanagers (Ramphocelus)
## as used in Paradis (1997)
ref <- c("U15717", "U15718", "U15719", "U15720",
         "U15721", "U15722", "U15723", "U15724")
## Copy/paste or type the following commands if you
## want to try them.
## Not run: 
Rampho <- read.GenBank(ref)
## get the species names:
attr(Rampho, "species")
## build a matrix with the species names and the accession numbers:
cbind(attr(Rampho, "species"), names(Rampho))
## print the first sequence
## (can be done with `Rampho$U15717' as well)
Rampho[[1]]
## the description from each FASTA sequence:
attr(Rampho, "description")

## End(Not run)

ape

Analyses of Phylogenetics and Evolution

v5.5
GPL-2 | GPL-3
Authors
Emmanuel Paradis [aut, cre, cph] (<https://orcid.org/0000-0003-3092-2199>), Simon Blomberg [aut, cph] (<https://orcid.org/0000-0003-1062-0839>), Ben Bolker [aut, cph] (<https://orcid.org/0000-0002-2127-0443>), Joseph Brown [aut, cph] (<https://orcid.org/0000-0002-3835-8062>), Santiago Claramunt [aut, cph] (<https://orcid.org/0000-0002-8926-5974>), Julien Claude [aut, cph] (<https://orcid.org/0000-0002-9267-1228>), Hoa Sien Cuong [aut, cph], Richard Desper [aut, cph], Gilles Didier [aut, cph] (<https://orcid.org/0000-0003-0596-9112>), Benoit Durand [aut, cph], Julien Dutheil [aut, cph] (<https://orcid.org/0000-0001-7753-4121>), RJ Ewing [aut, cph], Olivier Gascuel [aut, cph], Thomas Guillerme [aut, cph] (<https://orcid.org/0000-0003-4325-1275>), Christoph Heibl [aut, cph] (<https://orcid.org/0000-0002-7655-3299>), Anthony Ives [aut, cph] (<https://orcid.org/0000-0001-9375-9523>), Bradley Jones [aut, cph] (<https://orcid.org/0000-0003-4498-1069>), Franz Krah [aut, cph] (<https://orcid.org/0000-0001-7866-7508>), Daniel Lawson [aut, cph] (<https://orcid.org/0000-0002-5311-6213>), Vincent Lefort [aut, cph], Pierre Legendre [aut, cph] (<https://orcid.org/0000-0002-3838-3305>), Jim Lemon [aut, cph], Guillaume Louvel [aut, cph] (<https://orcid.org/0000-0002-7745-0785>), Eric Marcon [aut, cph] (<https://orcid.org/0000-0002-5249-321X>), Rosemary McCloskey [aut, cph] (<https://orcid.org/0000-0002-9772-8553>), Johan Nylander [aut, cph], Rainer Opgen-Rhein [aut, cph], Andrei-Alin Popescu [aut, cph], Manuela Royer-Carenzi [aut, cph], Klaus Schliep [aut, cph] (<https://orcid.org/0000-0003-2941-0161>), Korbinian Strimmer [aut, cph] (<https://orcid.org/0000-0001-7917-2056>), Damien de Vienne [aut, cph] (<https://orcid.org/0000-0001-9532-5251>)
Initial release
2021-04-24

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.