Translating DNA/RNA sequences
Functions for translating DNA or RNA sequences into amino acid sequences.
## Translating DNA/RNA: translate(x, genetic.code=GENETIC_CODE, no.init.codon=FALSE, if.fuzzy.codon="error") ## Extracting codons without translating them: codons(x)
x |
A DNAStringSet, RNAStringSet, DNAString,
RNAString, MaskedDNAString or MaskedRNAString
object for A DNAString, RNAString, MaskedDNAString or
MaskedRNAString object for |
genetic.code |
The genetic code to use for the translation of codons into Amino Acid
letters. It must be represented as a named character vector of length
64 similar to predefined constant
The default value for |
no.init.codon |
By default, |
if.fuzzy.codon |
How fuzzy codons (i.e codon with IUPAC ambiguities) should be handled. Accepted values are:
Alternatively
The accepted values for the 2nd string are:
All the 6 possible combinations of 1st and 2nd strings are supported.
Note that |
translate
reproduces the biological process of RNA
translation that occurs in the cell.
The input of the function can be either RNA or coding DNA.
By default The Standard Genetic Code (see ?GENETIC_CODE
)
is used to translate codons into amino acids but the user can
supply a different genetic code via the genetic.code
argument.
codons
is a utility for extracting the codons involved
in this translation without translating them.
For translate
: An AAString object when x
is a
DNAString, RNAString, MaskedDNAString, or
MaskedRNAString object.
An AAStringSet object parallel to x
(i.e. with 1
amino acid sequence per DNA or RNA sequence in x
) when x
is a DNAStringSet or RNAStringSet object. If x
has
names on it, they're propagated to the returned object.
For codons
: An XStringViews object with 1 view per codon.
When x
is a MaskedDNAString or MaskedRNAString object,
its masked parts are interpreted as introns and filled with the + letter
in the returned object. Therefore codons that span across masked regions
are represented by views that have a width > 3 and contain the + letter.
Note that each view is guaranteed to contain exactly 3 base letters.
AA_ALPHABET
for the Amino Acid alphabet.
GENETIC_CODE
for The Standard Genetic Code and
its known variants.
The examples for
extractTranscriptSeqs
in the GenomicFeatures package for computing the
full proteome of a given organism.
The reverseComplement
function.
The DNAStringSet and AAStringSet classes.
The XStringViews and MaskedXString classes.
## --------------------------------------------------------------------- ## 1. BASIC EXAMPLES ## --------------------------------------------------------------------- dna1 <- DNAString("TTGATATGGCCCTTATAA") translate(dna1) ## TTG is an alternative initiation codon in the Standard Genetic Code: translate(dna1, no.init.codon=TRUE) SGC1 <- getGeneticCode("SGC1") # Vertebrate Mitochondrial code translate(dna1, genetic.code=SGC1) ## TTG is NOT an alternative initiation codon in the Vertebrate ## Mitochondrial code: translate(dna1, genetic.code=SGC1, no.init.codon=TRUE) ## All 6 codons except 4th (CCC) are fuzzy: dna2 <- DNAString("HTGATHTGRCCCYTRTRA") ## Not run: translate(dna2) # error because of fuzzy codons ## End(Not run) ## Translate all fuzzy codons to X: translate(dna2, if.fuzzy.codon="X") ## Or solve the non-ambiguous ones (3rd codon is ambiguous so cannot be ## solved): translate(dna2, if.fuzzy.codon="solve") ## Fuzzy codons that are non-ambiguous with a given genetic code can ## become ambiguous with another genetic code, and vice versa: translate(dna2, genetic.code=SGC1, if.fuzzy.codon="solve") ## --------------------------------------------------------------------- ## 2. TRANSLATING AN OPEN READING FRAME ## --------------------------------------------------------------------- file <- system.file("extdata", "someORF.fa", package="Biostrings") x <- readDNAStringSet(file) x ## The first and last 1000 nucleotides are not part of the ORFs: x <- DNAStringSet(x, start=1001, end=-1001) ## Before calling translate() on an ORF, we need to mask the introns ## if any. We can get this information fron the SGD database ## (http://www.yeastgenome.org/). ## According to SGD, the 1st ORF (YAL001C) has an intron at 71..160 ## (see http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YAL001C) y1 <- x[[1]] mask1 <- Mask(length(y1), start=71, end=160) masks(y1) <- mask1 y1 translate(y1) ## Codons: codons(y1) which(width(codons(y1)) != 3) codons(y1)[20:28] ## --------------------------------------------------------------------- ## 3. AN ADVANCED EXAMPLE ## --------------------------------------------------------------------- ## Translation on the '-' strand: dna3 <- DNAStringSet(c("ATC", "GCTG", "CGACT")) translate(reverseComplement(dna3)) ## Translate sequences on both '+' and '-' strand across all ## possible reading frames (i.e., codon position 1, 2 or 3): ## First create a DNAStringSet of '+' and '-' strand sequences, ## removing the nucleotides prior to the reading frame start position. dna3_subseqs <- lapply(1:3, function(pos) subseq(c(dna3, reverseComplement(dna3)), start=pos)) ## Translation of 'dna3_subseqs' produces a list of length 3, each with ## 6 elements (3 '+' strand results followed by 3 '-' strand results). lapply(dna3_subseqs, translate) ## Note that translate() throws a warning when the length of the sequence ## is not divisible by 3. To avoid this warning wrap the function in ## suppressWarnings().
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.