Translate nucleic acid sequences into proteins
This function translates nucleic acid sequences into the corresponding peptide sequence. It can translate in any of the 3 forward or three reverse sense frames. In the case of reverse sense, the reverse-complement of the sequence is taken. It can translate using the standard (universal) genetic code and also with non-standard codes. Ambiguous bases can also be handled.
translate(seq, frame = 0, sens = "F", numcode = 1, NAstring = "X", ambiguous = FALSE)
seq |
the sequence to translate as a vector of single characters in lower case letters. |
frame |
Frame(s) (0,1,2) to translate. By default the frame |
sens |
Sense to translate: |
numcode |
The ncbi genetic code number for translation. By default the standard genetic code is used. |
NAstring |
How to translate amino-acids when there are ambiguous bases in codons. |
ambiguous |
If TRUE, ambiguous bases are taken into account so that for instance GGN is translated to Gly in the standard genetic code. |
The following genetic codes are described here. The number preceding each code
corresponds to numcode
.
standard
vertebrate.mitochondrial
yeast.mitochondrial
protozoan.mitochondrial+mycoplasma
invertebrate.mitochondrial
ciliate+dasycladaceal
echinoderm+flatworm.mitochondrial
euplotid
bacterial+plantplastid
alternativeyeast
ascidian.mitochondrial
alternativeflatworm.mitochondrial
blepharism
chlorophycean.mitochondrial
trematode.mitochondrial
scenedesmus.mitochondrial
thraustochytrium.mitochondria
Pterobranchia.mitochondrial
CandidateDivision.SR1+Gracilibacteria
Pachysolen.tannophilus
translate
returns a vector of single characters containing the peptide sequence in
the standard one-letter IUPAC code. Termination (STOP) codons are translated by
the character '*'.
D. Charif, J.R. Lobry
The genetic codes have been taken from the ncbi taxonomy database:
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi.
Last update October 05, 2000.
The IUPAC one-letter code for aminoacids is described at:
https://www.bioinformatics.org/sms/iupac.html
citation("seqinr")
Use tolower
to change upper case letters into lower case letters.
For coding sequences obtained from an ACNUC server with query
it's
better to use the function getTrans
so that the relevant genetic
code and the relevant frame are automatically used.
The genetic codes are given in the object SEQINR.UTIL
, a more
human readable form is given by the function tablecode
.
Use aaa
to get the three-letter code for amino-acids.
## ## Toy CDS example invented by Leonor Palmeira: ## toycds <- s2c("tctgagcaaataaatcgg") translate(seq = toycds) # should be c("S", "E", "Q", "I", "N", "R") ## ## Toy CDS example with ambiguous bases: ## toycds2 <- s2c("tcngarcarathaaycgn") translate(toycds2) # should be c("X", "X", "X", "X", "X", "X") translate(toycds2, ambiguous = TRUE) # should be c("S", "E", "Q", "I", "N", "R") translate(toycds2, ambiguous = TRUE, numcode = 2) # should be c("S", "E", "Q", "X", "N", "R") ## ## Real CDS example: ## realcds <- read.fasta(file = system.file("sequences/malM.fasta", package ="seqinr"))[[1]] translate(seq = realcds) # Biologically correct, only one stop codon at the end translate(seq = realcds, frame = 3, sens = "R", numcode = 6) # Biologically meaningless, note the in-frame stop codons # Read from an alignment as suggested by Dr. H. Suzuki fasta.res <- read.alignment(file = system.file("sequences/Anouk.fasta", package = "seqinr"), format = "fasta") AA1 <- seqinr::getTrans(s2c(fasta.res$seq[[1]])) AA2 <- seqinr::translate(s2c(fasta.res$seq[[1]])) identical(AA1, AA2) AA1 <- lapply(fasta.res$seq, function(x) seqinr::getTrans(s2c(x))) AA2 <- lapply(fasta.res$seq, function(x) seqinr::translate(s2c(x))) identical(AA1, AA2) ## Not run: ## Need internet connection. ## Translation of the following EMBL entry: ## ## FT CDS join(complement(153944..154157),complement(153727..153866), ## FT complement(152185..153037),138523..138735,138795..138955) ## FT /codon_start=1 ## FT /db_xref="FLYBASE:FBgn0002781" ## FT /db_xref="GOA:Q86B86" ## FT /db_xref="TrEMBL:Q86B86" ## FT /note="mod(mdg4) gene product from transcript CG32491-RZ; ## FT trans splicing" ## FT /gene="mod(mdg4)" ## FT /product="CG32491-PZ" ## FT /locus_tag="CG32491" ## FT /protein_id="AAO41581.1" ## FT /translation="MADDEQFSLCWNNFNTNLSAGFHESLCRGDLVDVSLAAEGQIVKA ## FT HRLVLSVCSPFFRKMFTQMPSNTHAIVFLNNVSHSALKDLIQFMYCGEVNVKQDALPAF ## FT ISTAESLQIKGLTDNDPAPQPPQESSPPPAAPHVQQQQIPAQRVQRQQPRASARYKIET ## FT VDDGLGDEKQSTTQIVIQTTAAPQATIVQQQQPQQAAQQIQSQQLQTGTTTTATLVSTN ## FT KRSAQRSSLTPASSSAGVKRSKTSTSANVMDPLDSTTETGATTTAQLVPQQITVQTSVV ## FT SAAEAKLHQQSPQQVRQEEAEYIDLPMELPTKSEPDYSEDHGDAAGDAEGTYVEDDTYG ## FT DMRYDDSYFTENEDAGNQTAANTSGGGVTATTSKAVVKQQSQNYSESSFVDTSGDQGNT ## FT EAQVTQHVRNCGPQMFLISRKGGTLLTINNFVYRSNLKFFGKSNNILYWECVQNRSVKC ## FT RSRLKTIGDDLYVTNDVHNHMGDNKRIEAAKAAGMLIHKKLSSLTAADKIQGSWKMDTE ## FT GNPDHLPKM" choosebank("emblTP") trans <- query("trans", "N=AE003734.PE35") trans1 <- getTrans(trans$req[[1]]) ## Complex transsplicing operations, the correct frame and the correct ## genetic code are automatically used for translation into protein. seq <- getSequence(trans$req[[1]]) identical(translate(seq),trans1) #default frame and genetic code are correct trans <- query("trans", "N=AB004237") trans1 <- getTrans(trans$req[[1]]) ## Complex transsplicing operations, the correct frame and the correct ## genetic code are automatically used for translation into protein. seq <- getSequence(trans$req[[1]]) identical(translate(seq),trans1) #default genetic code is not correct identical(translate(seq,numcode=2),trans1) #genetic code is 2 ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.