Maintain and use BAM files
Use BamFile()
to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList()
provides a convenient way of managing a list of
BamFile
instances.
## Constructors BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA) BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA) ## Opening / closing ## S3 method for class 'BamFile' open(con, ...) ## S3 method for class 'BamFile' close(con, ...) ## accessors; also path(), index(), yieldSize() ## S4 method for signature 'BamFile' isOpen(con, rw="") ## S4 method for signature 'BamFile' isIncomplete(con) ## S4 method for signature 'BamFile' obeyQname(object, ...) obeyQname(object, ...) <- value ## S4 method for signature 'BamFile' asMates(object, ...) asMates(object, ...) <- value ## S4 method for signature 'BamFile' qnamePrefixEnd(object, ...) qnamePrefixEnd(object, ...) <- value ## S4 method for signature 'BamFile' qnameSuffixStart(object, ...) qnameSuffixStart(object, ...) <- value ## actions ## S4 method for signature 'BamFile' scanBamHeader(files, ..., what=c("targets", "text")) ## S4 method for signature 'BamFile' seqinfo(x) ## S4 method for signature 'BamFileList' seqinfo(x) ## S4 method for signature 'BamFile' filterBam(file, destination, index=file, ..., filter=FilterRules(), indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat())) ## S4 method for signature 'BamFile' indexBam(files, ...) ## S4 method for signature 'BamFile' sortBam(file, destination, ..., byQname=FALSE, maxMemory=512) ## S4 method for signature 'BamFileList' mergeBam(files, destination, ...) ## reading ## S4 method for signature 'BamFile' scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat())) ## counting ## S4 method for signature 'BamFile' idxstatsBam(file, index=file, ...) ## S4 method for signature 'BamFile' countBam(file, index=file, ..., param=ScanBamParam()) ## S4 method for signature 'BamFileList' countBam(file, index=file, ..., param=ScanBamParam()) ## S4 method for signature 'BamFile' quickBamFlagSummary(file, ..., param=ScanBamParam(), main.groups.only=FALSE)
... |
Additional arguments. For |
con |
An instance of |
x, object, file, files |
A character vector of BAM file paths
(for |
index |
character(1); the BAM index file path (for
|
yieldSize |
Number of records to yield each time the file
is read from with |
asMates |
Logical indicating if records should be paired as mates. See ‘Fields’ section for details. |
qnamePrefixEnd |
Single character (or NA) marking the
end of the qname prefix. When specified, all characters prior to
and including the |
qnameSuffixStart |
Single character (or NA) marking the
start of the qname suffix. When specified, all characters following
and including the |
obeyQname |
Logical indicating if the BAM file is sorted
by |
value |
Logical value for setting |
what |
For |
filter |
A |
destination |
character(1) file path to write filtered reads to. |
indexDestination |
logical(1) indicating whether the destination file should also be indexed. |
byQname, maxMemory |
See |
param |
An optional |
rw |
Mode of file; ignored. |
main.groups.only |
See |
Objects are created by calls of the form BamFile()
.
The BamFile
class inherits fields from the
RsamtoolsFile
class and has fields:
Number of records to yield each time the file is
read from using scanBam
or, when length(bamWhich())
!= 0
, a threshold which yields records in complete ranges whose
sum first exceeds yieldSize
. Setting yieldSize
on a
BamFileList
does not alter existing yield sizes set on the
individual BamFile
instances.
A logical indicating if the records should be
returned as mated pairs. When TRUE
scanBam
attempts
to mate (pair) the records and returns two additional fields
groupid
and mate_status
. groupid
is an integer
vector of unique group ids; mate_status
is a factor with level
mated
for records successfully paired by the algorithm,
ambiguous
for records that are possibly mates but cannot be
assigned unambiguously, or unmated
for reads that did not
have valid mates.
Mate criteria:
Bit 0x40 and 0x80: Segments are a pair of first/last OR neither segment is marked first/last
Bit 0x100: Both segments are secondary OR both not secondary
Bit 0x10 and 0x20: Segments are on opposite strands
mpos match: segment1 mpos matches segment2 pos AND segment2 mpos matches segment1 pos
tid match
Flags, tags and ranges may be specified in the ScanBamParam
for fine tuning of results.
A logical(0) indicating if the file was sorted by
qname. In Bioconductor > 2.12 paired-end files do not need to be
sorted by qname
. Instead set asMates=TRUE
in the
BamFile
when using the readGAlignmentsList
function from the GenomicAlignments package.
BamFileList
inherits additional methods from
RsamtoolsFileList
and SimpleList
.
Opening / closing:
Opens the (local or remote) path
and
index
(if bamIndex
is not character(0)
),
files. Returns a BamFile
instance.
Closes the BamFile
con
; returning
(invisibly) the updated BamFile
. The instance may be
re-opened with open.BamFile
.
Tests whether the BamFile
con
has been
opened for reading.
Tests whether the BamFile
con
is
niether closed nor at the end of the file.
Accessors:
Returns a character(1) vector of BAM path names.
Returns a character(0) or character(1) vector of BAM index path names.
Return or set an integer(1) vector indicating yield size.
Return or set a logical(0) indicating if the file was sorted by qname.
Return or set a logical(0) indicating if the records should be returned as mated pairs.
Methods:
Visit the path in path(file)
, returning
the information contained in the file header; see
scanBamHeader
.
Visit the path in
path(file)
, returning a Seqinfo
,
character, or named integer vector containing information on the
anmes and / or lengths of each sequence. Seqnames are ordered
as they appear in the file.
Visit the path in path(file)
, returning the
result of scanBam
applied to the specified path.
Visit the path(s) in path(file)
, returning
the result of countBam
applied to the specified
path.
Visit the index in index(file)
, quickly
returning a data.frame
with columns seqnames
,
seqlength
, mapped
(number of mapped reads on
seqnames
) and unmapped
(number of unmapped reads).
Visit the path in path(file)
, returning the
result of filterBam
applied to the specified
path. A single file can be filtered to one or several
destinations, as described in filterBam
.
Visit the path in path(file)
, returning
the result of indexBam
applied to the specified
path.
Visit the path in path(file)
, returning the
result of sortBam
applied to the specified path.
Merge several BAM files into a single BAM file. See
mergeBam
for details; additional arguments supported
by mergeBam,character-method
are also available for
BamFileList
.
Compactly display the object.
Martin Morgan and Marc Carlson
The readGAlignments
,
readGAlignmentPairs
,
and readGAlignmentsList
functions defined in the GenomicAlignments package.
summarizeOverlaps
and
findSpliceOverlaps-methods in the
GenomicAlignments package for methods that work on a
BamFile and BamFileList objects.
## ## BamFile options. ## fl <- system.file("extdata", "ex1.bam", package="Rsamtools") bf <- BamFile(fl) bf ## When 'asMates=TRUE' scanBam() reads the data in as ## pairs. See 'asMates' above for details of the pairing ## algorithm. asMates(bf) <- TRUE ## When 'yieldSize' is set, scanBam() will iterate ## through the file in chunks. yieldSize(bf) <- 500 ## Some applications append a filename (e.g., NCBI Sequence Read ## Archive (SRA) toolkit) or allele identifier to the sequence qname. ## This may result in a unique qname for each record which presents a ## problem when mating paired-end reads (identical qnames is one ## criteria for paired-end mating). 'qnamePrefixEnd' and ## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix. qnamePrefixEnd(bf) <- "/" qnameSuffixStart(bf) <- "." ## ## Reading Bam files. ## fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE) (bf <- BamFile(fl)) head(seqlengths(bf)) # sequences and lengths in BAM file if (require(RNAseqData.HNRNPC.bam.chr14)) { bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES) bfl bfl[1:2] # subset bfl[[1]] # select first element -- BamFile ## merged across BAM files seqinfo(bfl) head(seqlengths(bfl)) } length(scanBam(fl)[[1]][[1]]) # all records bf <- open(BamFile(fl)) # implicit index bf identical(scanBam(bf), scanBam(fl)) close(bf) ## Use 'yieldSize' to iterate through a file in chunks. bf <- open(BamFile(fl, yieldSize=1000)) while (nrec <- length(scanBam(bf)[[1]][[1]])) cat("records:", nrec, "\n") close(bf) ## Repeatedly visit multiple ranges in the BamFile. rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584))) bf <- open(BamFile(fl)) sapply(seq_len(length(rng)), function(i, bamFile, rng) { param <- ScanBamParam(which=rng[i], what="seq") bam <- scanBam(bamFile, param=param)[[1]] alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE) }, bf, rng) close(bf)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.