MultipleAlignment objects
The MultipleAlignment class is a container for storing multiple sequence alignments.
## Constructors: DNAMultipleAlignment(x=character(), start=NA, end=NA, width=NA, use.names=TRUE, rowmask=NULL, colmask=NULL) RNAMultipleAlignment(x=character(), start=NA, end=NA, width=NA, use.names=TRUE, rowmask=NULL, colmask=NULL) AAMultipleAlignment(x=character(), start=NA, end=NA, width=NA, use.names=TRUE, rowmask=NULL, colmask=NULL) ## Read functions: readDNAMultipleAlignment(filepath, format) readRNAMultipleAlignment(filepath, format) readAAMultipleAlignment(filepath, format) ## Write funtions: write.phylip(x, filepath) ## ... and more (see below)
x |
Either a character vector (with no NAs), or an XString, XStringSet or XStringViews object containing strings with the same number of characters. If writing out a Phylip file, then x would be a MultipleAlignment object |
start,end,width |
Either |
use.names |
|
filepath |
A character vector (of arbitrary length when reading, of length 1
when writing) containing the paths to the files to read or write.
Note that special values like |
format |
Either |
rowmask |
a NormalIRanges object that will set masking for rows |
colmask |
a NormalIRanges object that will set masking for columns |
The MultipleAlignment class is designed to hold and represent multiple sequence alignments. The rows and columns within an alignment can be masked for ad hoc analyses.
In the code snippets below, x
is a MultipleAlignment object.
unmasked(x)
:
The underlying XStringSet object containing the multiple
sequence alignment.
rownames(x)
:
NULL
or a character vector of the same length as x
containing a short user-provided description or comment for each
sequence in x
.
rowmask(x)
, rowmask(x, append, invert) <- value
:
Gets and sets the NormalIRanges object representing the
masked rows in x
. The append
argument takes
union
, replace
or intersect
to indicate how
to combine the new value
with rowmask(x)
. The
invert
argument takes a logical argument to indicate
whether or not to invert the new mask. The value
argument
can be of any class that is coercible to a NormalIRanges
via the as
function.
colmask(x)
, colmask(x, append, invert) <- value
:
Gets and sets the NormalIRanges object representing the
masked columns in x
. The append
argument takes
union
, replace
or intersect
to indicate how
to combine the new value
with colmask(x)
. The
invert
argument takes a logical argument to indicate
whether or not to invert the new mask. The value
argument
can be of any class that is coercible to a NormalIRanges
via the as
function.
maskMotif(x, motif, min.block.width=1, ...)
:
Returns a MultipleAlignment object with a modified column mask
based upon motifs found in the consensus string where the consensus
string keeps all the columns but drops the masked rows.
The motif to mask.
The minimum width of the blocks to mask.
Additional arguments for matchPattern
.
maskGaps(x, min.fraction, min.block.width)
:
Returns a MultipleAlignment object with a modified column mask
based upon gaps in the columns. In particular, this mask is defined
by min.block.width
or more consecutive columns that have
min.fraction
or more of their non-masked rows containing
gap codes.
A value in [0, 1]
that indicates
the minimum fraction needed to call a gap in the consensus string
(default is 0.5
).
A positive integer that indicates the
minimum number of consecutive gaps to mask, as defined by
min.fraction
(default is 4
).
nrow(x)
:
Returns the number of sequences aligned in x
.
ncol(x)
:
Returns the number of characters for each alignment in x
.
dim(x)
:
Equivalent to c(nrow(x), ncol(x))
.
maskednrow(x)
:
Returns the number of masked aligned sequences in x
.
maskedncol(x)
:
Returns the number of masked aligned characters in x
.
maskeddim(x)
:
Equivalent to c(maskednrow(x), maskedncol(x))
.
maskedratio(x)
:
Equivalent to maskeddim(x) / dim(x)
.
nchar(x)
:
Returns the number of unmasked aligned characters in x
,
i.e. ncol(x) - maskedncol(x)
.
alphabet(x)
:
Equivalent to alphabet(unmasked(x))
.
In the code snippets below, x
is a MultipleAlignment object.
as(from, "DNAStringSet")
, as(from, "RNAStringSet")
,
as(from, "AAStringSet")
, as(from, "BStringSet")
:
Creates an instance of the specified XStringSet object subtype
that contains the unmasked regions of the multiple sequence alignment
in x
.
as.character(x, use.names)
:
Convert x
to a character vector containing the unmasked
regions of the multiple sequence alignment. use.names
controls whether or not rownames(x)
should be used to set
the names of the returned vector (default is TRUE
).
as.matrix(x, use.names)
:
Returns a character matrix containing the "exploded" representation
of the unmasked regions of the multiple sequence alignment.
use.names
controls whether or not rownames(x)
should
be used to set the row names of the returned matrix (default is
TRUE
).
In the code snippets below, x is a MultipleAlignment object.
consensusMatrix(x, as.prob, baseOnly)
:
Creates an integer matrix containing the column frequencies of
the underlying alphabet with masked columns being represented
with NA
values. If as.prob
is TRUE
, then
probabilities are reported, otherwise counts are reported (the
default). If baseOnly
is TRUE
, then the non-base
letters are collapsed into an "other"
category.
consensusString(x, ...)
:
Creates a consensus string for x
with the symbol "#"
representing a masked column. See consensusString
for details on the arguments.
consensusViews(x, ...)
:
Similar to the consensusString
method. It returns a
XStringViews on the consensus string containing subsequence
contigs of non-masked columns. Unlike the consensusString
method, the masked columns in the underlying string contain a
consensus value rather than the "#"
symbol.
alphabetFrequency(x, as.prob, collapse)
:
Creates an integer matrix containing the row frequencies of
the underlying alphabet. If as.prob
is TRUE
, then
probabilities are reported, otherwise counts are reported (the
default). If collapse
is TRUE
, then returns the
overall frequency instead of the frequency by row.
detail(x, invertColMask, hideMaskedCols)
: Allows for a full
pager driven display of the object so that masked cols and rows
can be removed and the entire sequence can be visually
inspected. If hideMaskedCols
is set to it's default value
of TRUE
then the output will hide all the the masked
columns in the output. Otherwise, all columns will be displayed
along with a row to indicate the masking status. If
invertColMask
is TRUE
then any displayed mask will
be flipped so as to represent things in a way consistent with
Phylip style files instead of the mask that is actually stored in
the MultipleAlignment
object. Please notice that
invertColMask
will be ignored if hideMaskedCols
is
set to its default value of TRUE
since in that case it will
not make sense to show any masking information in the output.
Masked rows are always hidden in the output.
The letters in a DNAMultipleAlignment or RNAMultipleAlignment object
are colored when displayed by the show()
method. Set global
option Biostrings.coloring
to FALSE to turn off this coloring.
P. Aboyoun and M. Carlson
## create an object from file origMAlign <- readDNAMultipleAlignment(filepath = system.file("extdata", "msx2_mRNA.aln", package="Biostrings"), format="clustal") ## list the names of the sequences in the alignment rownames(origMAlign) ## rename the sequences to be the underlying species for MSX2 rownames(origMAlign) <- c("Human","Chimp","Cow","Mouse","Rat", "Dog","Chicken","Salmon") origMAlign ## See a detailed pager view if (interactive()) { detail(origMAlign) } ## operations to mask rows ## For columns, just use colmask() and do the same kinds of operations rowMasked <- origMAlign rowmask(rowMasked) <- IRanges(start=1,end=3) rowMasked ## remove rowumn masks rowmask(rowMasked) <- NULL rowMasked ## "select" rows of interest rowmask(rowMasked, invert=TRUE) <- IRanges(start=4,end=7) rowMasked ## or mask the rows that intersect with masked rows rowmask(rowMasked, append="intersect") <- IRanges(start=1,end=5) rowMasked ## TATA-masked tataMasked <- maskMotif(origMAlign, "TATA") colmask(tataMasked) ## automatically mask rows based on consecutive gaps autoMasked <- maskGaps(origMAlign, min.fraction=0.5, min.block.width=4) colmask(autoMasked) autoMasked ## calculate frequencies alphabetFrequency(autoMasked) consensusMatrix(autoMasked, baseOnly=TRUE)[, 84:90] ## get consensus values consensusString(autoMasked) consensusViews(autoMasked) ## cluster the masked alignments sdist <- stringDist(as(autoMasked,"DNAStringSet"), method="hamming") clust <- hclust(sdist, method = "single") plot(clust) fourgroups <- cutree(clust, 4) fourgroups ## write out the alignement object (with current masks) to Phylip format write.phylip(x = autoMasked, filepath = tempfile("foo.txt",tempdir()))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.