Codon usage indices
uco
calculates some codon usage indices: the codon counts eff
, the relative frequencies freq
or the Relative Synonymous Codon Usage rscu
.
uco(seq, frame = 0, index = c("eff", "freq", "rscu"), as.data.frame = FALSE, NA.rscu = NA)
seq |
a coding sequence as a vector of chars |
frame |
an integer (0, 1, 2) giving the frame of the coding sequence |
index |
codon usage index choice, partial matching is allowed.
"eff", "freq", and "rscu" correspond to "R0", "R1", and "R3", respectively, in Suzuki et al. (2005) "2.2 Normalization of codon usage data". "eff" and "rscu" correspond to "AF" and "RSCU", respectively, in Suzuki et al. (2008) "2.2. Definitions of codon usage data". |
as.data.frame |
logical. If |
NA.rscu |
when an amino-acid is missing, RSCU are no more defined and repported
as missing values ( |
Codons with ambiguous bases are ignored.
RSCU is a simple measure of non-uniform usage of synonymous codons in a coding sequence
(Sharp et al. 1986).
RSCU values are the number of times a particular codon is observed, relative to the number
of times that the codon would be observed for a uniform synonymous codon usage (i.e. all the
codons for a given amino-acid have the same probability).
In the absence of any codon usage bias, the RSCU values would be 1.00 (this is the case
for sequence cds
in the exemple thereafter). A codon that is used
less frequently than expected will have an RSCU value of less than 1.00 and vice versa for a codon
that is used more frequently than expected.
Do not use correspondence analysis on RSCU tables as this is a source of artifacts
(Perrière and Thioulouse 2002, Suzuki et al. 2008). Within-aminoacid correspondence analysis is a
simple way to study synonymous codon usage (Charif et al. 2005). For an introduction
to correspondence analysis and within-aminoacid correspondence analysis see the
chapter titled Multivariate analyses in the seqinR manual that ships with the
seqinR package in the doc folder. You can also use internal correspondence
analysis if you want to analyze simultaneously a row-block structure such as the
within and between species variability (Lobry and Chessel 2003).
If as.data.frame
is FALSE, uco
returns one of these:
a table of codon counts
a table of codon relative frequencies
a numeric vector of relative synonymous codon usage values
If as.data.frame
is TRUE, uco
returns a data frame with five columns:
a vector containing the name of amino-acid
a vector containing the corresponding codon
a numeric vector of codon counts
a numeric vector of codon relative frequencies
a numeric vector of RSCU index
If as.data.frame
is FALSE, the default, a table for eff
and freq
and
a numeric vector for rscu
. If as.data.frame
is TRUE,
a data frame with all indices is returned.
D. Charif, J.R. Lobry, G. Perrière
citation("seqinr")
Sharp, P.M., Tuohy, T.M.F., Mosurski, K.R. (1986) Codon usage in yeast: cluster
analysis clearly differentiates highly and lowly expressed genes.
Nucl. Acids. Res., 14:5125-5143.
Perrière, G., Thioulouse, J. (2002) Use and misuse of correspondence analysis in
codon usage studies. Nucl. Acids. Res., 30:4548-4555.
Lobry, J.R., Chessel, D. (2003) Internal correspondence analysis of codon and
amino-acid usage in thermophilic bacteria.
Journal of Applied Genetics, 44:235-261. http://jag.igr.poznan.pl/2003-Volume-44/2/pdf/2003_Volume_44_2-235-261.pdf.
Charif, D., Thioulouse, J., Lobry, J.R., Perrière, G. (2005) Online
Synonymous Codon Usage Analyses with the ade4 and seqinR packages.
Bioinformatics, 21:545-547. https://pbil.univ-lyon1.fr/members/lobry/repro/bioinfo04/.
Suzuki, H., Saito, R. Tomita, R. (2005)
A problem in multivariate analysis of codon usage data and a possible solution.
FEBS Lett., 579:6499-504. https://febs.onlinelibrary.wiley.com/doi/full/10.1016/j.febslet.2005.10.032.
Suzuki, H., Brown, C.J., Forney, L.J., Top, E. (2008) Comparison of Correspondence Analysis Methods for Synonymous Codon Usage in Bacteria. DNA Research, 15:357-365. https://academic.oup.com/dnaresearch/article/15/6/357/513030.
## Show all possible codons: words() ## Make a coding sequence from this: (cds <- s2c(paste(words(), collapse = ""))) ## Get codon counts: uco(cds, index = "eff") ## Get codon relative frequencies: uco(cds, index = "freq") ## Get RSCU values: uco(cds, index = "rscu") ## Show what happens with ambiguous bases: uco(s2c("aaannnttt")) ## Use a real coding sequence: rcds <- read.fasta(file = system.file("sequences/malM.fasta", package = "seqinr"))[[1]] uco( rcds, index = "freq") uco( rcds, index = "eff") uco( rcds, index = "rscu") uco( rcds, as.data.frame = TRUE) ## Show what happens with RSCU when an amino-acid is missing: ecolicgpe5 <- read.fasta(file = system.file("sequences/ecolicgpe5.fasta",package="seqinr"))[[1]] uco(ecolicgpe5, index = "rscu") ## Force NA to zero: uco(ecolicgpe5, index = "rscu", NA.rscu = 0)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.