Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

translate

Translate nucleic acid sequences into proteins


Description

This function translates nucleic acid sequences into the corresponding peptide sequence. It can translate in any of the 3 forward or three reverse sense frames. In the case of reverse sense, the reverse-complement of the sequence is taken. It can translate using the standard (universal) genetic code and also with non-standard codes. Ambiguous bases can also be handled.

Usage

translate(seq, frame = 0, sens = "F", numcode = 1, NAstring = "X", ambiguous = FALSE)

Arguments

seq

the sequence to translate as a vector of single characters in lower case letters.

frame

Frame(s) (0,1,2) to translate. By default the frame 0 is used.

sens

Sense to translate: F for forward sense and R for reverse sense.

numcode

The ncbi genetic code number for translation. By default the standard genetic code is used.

NAstring

How to translate amino-acids when there are ambiguous bases in codons.

ambiguous

If TRUE, ambiguous bases are taken into account so that for instance GGN is translated to Gly in the standard genetic code.

Details

The following genetic codes are described here. The number preceding each code corresponds to numcode.

1

standard

2

vertebrate.mitochondrial

3

yeast.mitochondrial

4

protozoan.mitochondrial+mycoplasma

5

invertebrate.mitochondrial

6

ciliate+dasycladaceal

9

echinoderm+flatworm.mitochondrial

10

euplotid

11

bacterial+plantplastid

12

alternativeyeast

13

ascidian.mitochondrial

14

alternativeflatworm.mitochondrial

15

blepharism

16

chlorophycean.mitochondrial

21

trematode.mitochondrial

22

scenedesmus.mitochondrial

23

thraustochytrium.mitochondria

24

Pterobranchia.mitochondrial

25

CandidateDivision.SR1+Gracilibacteria

26

Pachysolen.tannophilus

Value

translate returns a vector of single characters containing the peptide sequence in the standard one-letter IUPAC code. Termination (STOP) codons are translated by the character '*'.

Author(s)

D. Charif, J.R. Lobry

References

The genetic codes have been taken from the ncbi taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi. Last update October 05, 2000.
The IUPAC one-letter code for aminoacids is described at: https://www.bioinformatics.org/sms/iupac.html

citation("seqinr")

See Also

Use tolower to change upper case letters into lower case letters. For coding sequences obtained from an ACNUC server with query it's better to use the function getTrans so that the relevant genetic code and the relevant frame are automatically used. The genetic codes are given in the object SEQINR.UTIL, a more human readable form is given by the function tablecode. Use aaa to get the three-letter code for amino-acids.

Examples

##
## Toy CDS example invented by Leonor Palmeira:
##
toycds <- s2c("tctgagcaaataaatcgg")
translate(seq = toycds) # should be c("S", "E", "Q", "I", "N", "R")
##
## Toy CDS example with ambiguous bases:
##
toycds2 <- s2c("tcngarcarathaaycgn")
translate(toycds2) # should be c("X", "X", "X", "X", "X", "X")
translate(toycds2, ambiguous = TRUE) # should be c("S", "E", "Q", "I", "N", "R")
translate(toycds2, ambiguous = TRUE, numcode = 2) # should be c("S", "E", "Q", "X", "N", "R")
##
## Real CDS example:
##
realcds <- read.fasta(file = system.file("sequences/malM.fasta", package ="seqinr"))[[1]]
translate(seq = realcds)
# Biologically correct, only one stop codon at the end
translate(seq = realcds, frame = 3, sens = "R", numcode = 6)
# Biologically meaningless, note the in-frame stop codons

# Read from an alignment as suggested by Dr. H. Suzuki
fasta.res    <- read.alignment(file = system.file("sequences/Anouk.fasta", package = "seqinr"),
 format = "fasta")

AA1 <- seqinr::getTrans(s2c(fasta.res$seq[[1]]))
AA2 <- seqinr::translate(s2c(fasta.res$seq[[1]]))
identical(AA1, AA2)

AA1 <- lapply(fasta.res$seq, function(x) seqinr::getTrans(s2c(x)))
AA2 <- lapply(fasta.res$seq, function(x) seqinr::translate(s2c(x)))
identical(AA1, AA2)

## Not run: 
## Need internet connection.
## Translation of the following EMBL entry:
##
## FT   CDS             join(complement(153944..154157),complement(153727..153866),
## FT                   complement(152185..153037),138523..138735,138795..138955)
## FT                   /codon_start=1
## FT                   /db_xref="FLYBASE:FBgn0002781"
## FT                   /db_xref="GOA:Q86B86"
## FT                   /db_xref="TrEMBL:Q86B86"
## FT                   /note="mod(mdg4) gene product from transcript CG32491-RZ;
## FT                   trans splicing"
## FT                   /gene="mod(mdg4)"
## FT                   /product="CG32491-PZ"
## FT                   /locus_tag="CG32491"
## FT                   /protein_id="AAO41581.1"
## FT                   /translation="MADDEQFSLCWNNFNTNLSAGFHESLCRGDLVDVSLAAEGQIVKA
## FT                   HRLVLSVCSPFFRKMFTQMPSNTHAIVFLNNVSHSALKDLIQFMYCGEVNVKQDALPAF
## FT                   ISTAESLQIKGLTDNDPAPQPPQESSPPPAAPHVQQQQIPAQRVQRQQPRASARYKIET
## FT                   VDDGLGDEKQSTTQIVIQTTAAPQATIVQQQQPQQAAQQIQSQQLQTGTTTTATLVSTN
## FT                   KRSAQRSSLTPASSSAGVKRSKTSTSANVMDPLDSTTETGATTTAQLVPQQITVQTSVV
## FT                   SAAEAKLHQQSPQQVRQEEAEYIDLPMELPTKSEPDYSEDHGDAAGDAEGTYVEDDTYG
## FT                   DMRYDDSYFTENEDAGNQTAANTSGGGVTATTSKAVVKQQSQNYSESSFVDTSGDQGNT
## FT                   EAQVTQHVRNCGPQMFLISRKGGTLLTINNFVYRSNLKFFGKSNNILYWECVQNRSVKC
## FT                   RSRLKTIGDDLYVTNDVHNHMGDNKRIEAAKAAGMLIHKKLSSLTAADKIQGSWKMDTE
## FT                   GNPDHLPKM"
choosebank("emblTP")
trans <- query("trans", "N=AE003734.PE35")
trans1 <- getTrans(trans$req[[1]])
## Complex transsplicing operations, the correct frame and the correct
## genetic code are automatically used for translation into protein.
seq <- getSequence(trans$req[[1]])
identical(translate(seq),trans1)
#default frame and genetic code are correct
trans <- query("trans", "N=AB004237")
trans1 <- getTrans(trans$req[[1]])
## Complex transsplicing operations, the correct frame and the correct
## genetic code are automatically used for translation into protein.
seq <- getSequence(trans$req[[1]])
identical(translate(seq),trans1)
#default  genetic code is not correct
identical(translate(seq,numcode=2),trans1)
#genetic code is 2

## End(Not run)

seqinr

Biological Sequences Retrieval and Analysis

v4.2-16
GPL (>= 2)
Authors
Delphine Charif [aut], Olivier Clerc [ctb], Carolin Frank [ctb], Jean R. Lobry [aut, cph], Anamaria Necşulea [ctb], Leonor Palmeira [ctb], Simon Penel [cre], Guy Perrière [ctb]
Initial release
2022-05-19

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.