Dot Plot Comparison of two sequences
Dot plots are most likely the oldest visual representation used to compare two sequences (see Maizel and Lenk 1981 and references therein). In its simplest form, a dot is produced at position (i,j) iff character number i in the first sequence is the same as character number j in the second sequence. More eleborated forms use sliding windows and a threshold value for two windows to be considered as matched.
dotPlot(seq1, seq2, wsize = 1, wstep = 1, nmatch = 1, col = c("white", "black"), xlab = deparse(substitute(seq1)), ylab = deparse(substitute(seq2)), ...)
seq1 |
the first sequence (x-axis) as a vector of single chars. |
seq2 |
the second sequence (y-axis) as a vector of single char. |
wsize |
the size in chars of the moving window. |
wstep |
the size in chars for the steps of the moving window.
Use |
nmatch |
if the number of match per window is greater than or equal
to |
col |
color of points passed to |
xlab |
label of x-axis passed to |
ylab |
label of y-axis passed to |
... |
further arguments passed to |
NULL.
J.R. Lobry
Maizel, J.V. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of
Nucleic Acid and Protein Sequences.
Proceedings of the National Academy of Science USA,
78:7665-7669.
citation("seqinr")
# # Identity is on the main diagonal: # dotPlot(letters, letters, main = "Direct repeat") # # Internal repeats are off the main diagonal: # dotPlot(rep(letters, 2), rep(letters, 2), main = "Internal repeats") # # Inversions are orthogonal to the main diagonal: # dotPlot(letters, rev(letters), main = "Inversion") # # Insertion in the second sequence yields a vertical jump: # dotPlot(letters, c(letters[1:10], s2c("insertion"), letters[11:26]), main = "Insertion in the second sequence", asp = 1) # # Insertion in the first sequence yields an horizontal jump: # dotPlot(c(letters[1:10], s2c("insertion"), letters[11:26]), letters, main = "Insertion in the first sequence", asp = 1) # # Protein sequences have usually a good signal/noise ratio because there # are 20 possible amino-acids: # aafile <- system.file("sequences/seqAA.fasta", package = "seqinr") protein <- read.fasta(aafile)[[1]] dotPlot(protein, protein, main = "Dot plot of a protein\nwsize = 1, wstep = 1, nmatch = 1") # # Nucleic acid sequences have usually a poor signal/noise ratio because # there are only 4 different bases: # dnafile <- system.file("sequences/malM.fasta", package = "seqinr") dna <- protein <- read.fasta(dnafile)[[1]] dotPlot(dna[1:200], dna[1:200], main = "Dot plot of a nucleic acid sequence\nwsize = 1, wstep = 1, nmatch = 1") # # Play with the wsize, wstep and nmatch arguments to increase the # signal/noise ratio: # dotPlot(dna[1:200], dna[1:200], wsize = 3, wstep = 3, nmatch = 3, main = "Dot plot of a nucleic acid sequence\nwsize = 3, wstep = 3, nmatch = 3")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.