Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

fasta2DNAbin

Read large DNA alignments into R


Description

The function fasta2DNAbin reads alignments with the fasta format (extensions ".fasta", ".fas", or ".fa"), and outputs a DNAbin object (the efficient DNA representation from the ape package). The output contains either the full alignments, or only SNPs. This implementation is designed for memory-efficiency, and can read in larger datasets than Ape's read.dna.

The function reads data by chunks of a few genomes (minimum 1, no maximum) at a time, which allows one to read massive datasets with negligible RAM requirements (albeit at a cost of computational time). The argument chunkSize indicates the number of genomes read at a time. Increasing this value decreases the computational time required to read data in, while increasing memory requirements.

Usage

fasta2DNAbin(file, quiet=FALSE, chunkSize=10, snpOnly=FALSE)

Arguments

file

a character string giving the path to the file to convert, with the extension ".fa", ".fas", or ".fasta".

Can also be a connection (which will be opened for reading if necessary, and if so closed (and hence destroyed) at the end of the function call).

quiet

a logical stating whether a conversion messages should be printed (FALSE, default) or not (TRUE).

chunkSize

an integer indicating the number of genomes to be read at a time; larger values require more RAM but decrease the time needed to read the data.

snpOnly

a logical indicating whether SNPs only should be returned.

Value

an object of the class DNAbin

Author(s)

Thibaut Jombart t.jombart@imperial.ac.uk

See Also

- ?DNAbin for a description of the class DNAbin.

- read.snp: read SNPs in adegenet's '.snp' format.

- read.PLINK: read SNPs in PLINK's '.raw' format.

- df2genind: convert any multiallelic markers into adegenet genind.

- import2genind: read multiallelic markers from various software into adegenet.

Examples

## Not run: 
## show the example file ##
## this is the path to the file:
myPath <- system.file("files/usflu.fasta",package="adegenet")
myPath

## read the file
obj <- fasta2DNAbin(myPath, chunk=10) # process 10 sequences at a time
obj

## End(Not run)

adegenet

Exploratory Analysis of Genetic and Genomic Data

v2.1.3
GPL (>= 2)
Authors
Thibaut Jombart [aut] (<https://orcid.org/0000-0003-2226-8692>), Zhian N. Kamvar [aut, cre] (<https://orcid.org/0000-0003-1458-7108>), Caitlin Collins [ctb], Roman Lustrik [ctb], Marie-Pauline Beugin [ctb], Brian J. Knaus [ctb], Peter Solymos [ctb], Vladimir Mikryukov [ctb], Klaus Schliep [ctb], Tiago Maié [ctb], Libor Morkovsky [ctb], Ismail Ahmed [ctb], Anne Cori [ctb], Federico Calboli [ctb], RJ Ewing [ctb], Frédéric Michaud [ctb], Rebecca DeCamp [ctb], Alexandre Courtiol [ctb] (<https://orcid.org/0000-0003-0637-2959>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.