Finding hits between reads and transcripts that are compatible with the splicing of the transcript
In the context of an RNA-seq experiment, findCompatibleOverlaps
(or countCompatibleOverlaps
) can be used for finding (or counting)
hits between reads and transcripts that are compatible
with the splicing of the transcript.
findCompatibleOverlaps(query, subject) countCompatibleOverlaps(query, subject)
query |
A GAlignments or GAlignmentPairs object representing the aligned reads. |
subject |
A GRangesList object representing the transcripts. |
findCompatibleOverlaps
is a specialized version of
findOverlaps
that uses
encodeOverlaps
internally to keep only
the hits where the junctions in the aligned read are compatible
with the splicing of the annotated transcript.
The topic of working with overlap encodings is covered in details
in the "OverlapEncodings" vignette located this package
(GenomicAlignments) and accessible with
vignette("OverlapEncodings")
.
A Hits object for findCompatibleOverlaps
.
An integer vector parallel to (i.e. same length as) query
for countCompatibleOverlaps
.
Hervé Pagès
The findOverlaps
generic function defined
in the IRanges package.
The encodeOverlaps
generic function and
OverlapEncodings class.
The "OverlapEncodings" vignette in this package.
GAlignments and GAlignmentPairs objects.
GRangesList objects in the GenomicRanges package.
## Here we only show a simple example illustrating the use of ## countCompatibleOverlaps() on a very small data set. Please ## refer to the "OverlapEncodings" vignette in the GenomicAlignments ## package for a comprehensive presentation of "overlap ## encodings" and related tools/concepts (e.g. "compatible" ## overlaps, "almost compatible" overlaps etc...), and for more ## examples. ## sm_treated1.bam contains a small subset of treated1.bam, a BAM ## file containing single-end reads from the "Pasilla" experiment ## (RNA-seq, Fly, see the pasilla data package for the details) ## and aligned to reference genome BDGP Release 5 (aka dm3 genome on ## the UCSC Genome Browser): sm_treated1 <- system.file("extdata", "sm_treated1.bam", package="GenomicAlignments", mustWork=TRUE) ## Load the alignments: flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE) param0 <- ScanBamParam(flag=flag0) gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0) ## Load the transcripts (IMPORTANT: Like always, the reference genome ## of the transcripts must be *exactly* the same as the reference ## genome used to align the reads): library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene exbytx <- exonsBy(txdb, by="tx", use.names=TRUE) ## Number of "compatible" transcripts per alignment in 'gal': gal_ncomptx <- countCompatibleOverlaps(gal, exbytx) mcols(gal)$ncomptx <- gal_ncomptx table(gal_ncomptx) mean(gal_ncomptx >= 1) ## --> 33% of the alignments in 'gal' are "compatible" with at least ## 1 transcript in 'exbytx'. ## Keep only alignments compatible with at least 1 transcript in ## 'exbytx': compgal <- gal[gal_ncomptx >= 1] head(compgal)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.