Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

exonicParts

Extract non-overlapping exonic or intronic parts from a TxDb-like object


Description

exonicParts and intronicParts extract the non-overlapping (a.k.a. disjoint) exonic or intronic parts from a TxDb-like object.

Usage

exonicParts(txdb, linked.to.single.gene.only=FALSE)
intronicParts(txdb, linked.to.single.gene.only=FALSE)

## 3 helper functions used internally by exonicParts() and intronicParts():
tidyTranscripts(txdb, drop.geneless=FALSE)
tidyExons(txdb, drop.geneless=FALSE)
tidyIntrons(txdb, drop.geneless=FALSE)

Arguments

txdb

A TxDb object, or any TxDb-like object that supports the transcripts() and exonsBy() extractors (e.g. an EnsDb object).

linked.to.single.gene.only

TRUE or FALSE.

If FALSE (the default), then the disjoint parts are obtained by calling disjoin() on all the exons (or introns) in txdb, including on exons (or introns) not linked to a gene or linked to more than one gene.

If TRUE, then the disjoint parts are obtained in 2 steps:

  1. call disjoin() on the exons (or introns) linked to at least one gene,

  2. then drop the parts linked to more than one gene from the set of exonic (or intronic) parts obtained previously.

drop.geneless

If FALSE (the default), then all the transcripts (or exons, or introns) get extracted from the TxDb object.

If TRUE, then only the transcripts (or exons, or introns) that are linked to a gene get extracted from the TxDb object.

Note that drop.geneless also impacts the order in which the features are returned:

  • Transcripts: If drop.geneless is FALSE then transcripts are returned in the same order as with transcripts, which is expected to be by internal transcript id (tx_id). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id.

  • Exons: If drop.geneless is FALSE then exons are ordered first by internal transcript id (tx_id), then by exon rank (exon_rank). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id, and then by exon rank.

  • Introns: If drop.geneless is FALSE then introns are ordered by internal transcript id (tx_id). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id.

Value

exonicParts returns a disjoint and strictly sorted GRanges object with 1 range per exonic part and with metadata columns tx_id, tx_name, gene_id, exon_id, exon_name, and exon_rank. If linked.to.single.gene.only was set to TRUE, an additional exonic_part metadata column is added that indicates the rank of each exonic part within all the exonic parts linked to the same gene.

intronicParts returns a disjoint and strictly sorted GRanges object with 1 range per intronic part and with metadata columns tx_id, tx_name, and gene_id. If linked.to.single.gene.only was set to TRUE, an additional intronic_part metadata column is added that indicates the rank of each intronic part within all the intronic parts linked to the same gene.

tidyTranscripts returns a GRanges object with 1 range per transcript and with metadata columns tx_id, tx_name, and gene_id.

tidyExons returns a GRanges object with 1 range per exon and with metadata columns tx_id, tx_name, gene_id, exon_id, exon_name, and exon_rank.

tidyIntrons returns a GRanges object with 1 range per intron and with metadata columns tx_id, tx_name, and gene_id.

Note

exonicParts is a replacement for disjointExons with the following differences/improvements:

  • Argument linked.to.single.gene.only in exonicParts replaces argument aggregateGenes in disjointExons, but has opposite meaning i.e. exonicParts(txdb, linked.to.single.gene.only=TRUE) returns the same exonic parts as disjointExons(txdb, aggregateGenes=FALSE).

  • Unlike disjointExons(txdb, aggregateGenes=TRUE), exonicParts(txdb, linked.to.single.gene.only=FALSE) does NOT discard exon parts that are not linked to a gene.

  • exonicParts is almost 2x more efficient than disjointExons.

  • exonicParts works out-of-the-box on any TxDb-like object that supports the transcripts() and exonsBy() extractors (e.g. on an EnsDb object).

Author(s)

Hervé Pagès

See Also

Examples

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## ---------------------------------------------------------------------
## exonicParts()
## ---------------------------------------------------------------------

exonic_parts1 <- exonicParts(txdb)
exonic_parts1

## Mapping from exonic parts to genes is many-to-many:
gene_id1 <- mcols(exonic_parts1)$gene_id
gene_id1  # CharacterList object
table(lengths(gene_id1))
## The number of known genes a Human exonic part can be linked to
## varies from 0 to 22!

exonic_parts2 <- exonicParts(txdb, linked.to.single.gene.only=TRUE)
exonic_parts2

## Mapping from exonic parts to genes now is many-to-one:
gene_id2 <- mcols(exonic_parts2)$gene_id
gene_id2[1:20]  # character vector

## Select exonic parts for a given gene:
exonic_parts2[gene_id2 %in% "643837"]

## Sanity checks:
stopifnot(isDisjoint(exonic_parts1), isStrictlySorted(exonic_parts1))
stopifnot(isDisjoint(exonic_parts2), isStrictlySorted(exonic_parts2))
stopifnot(all(exonic_parts2 %within% reduce(exonic_parts1)))
stopifnot(identical(
    lengths(gene_id1) == 1L,
    exonic_parts1 %within% exonic_parts2
))

## ---------------------------------------------------------------------
## intronicParts()
## ---------------------------------------------------------------------

intronic_parts1 <- intronicParts(txdb)
intronic_parts1

## Mapping from intronic parts to genes is many-to-many:
mcols(intronic_parts1)$gene_id
table(lengths(mcols(intronic_parts1)$gene_id))
## A Human intronic part can be linked to 0 to 22 known genes!

intronic_parts2 <- intronicParts(txdb, linked.to.single.gene.only=TRUE)
intronic_parts2

## Mapping from intronic parts to genes now is many-to-one:
class(mcols(intronic_parts2)$gene_id)  # character vector

## Sanity checks:
stopifnot(isDisjoint(intronic_parts1), isStrictlySorted(intronic_parts1))
stopifnot(isDisjoint(intronic_parts2), isStrictlySorted(intronic_parts2))
stopifnot(all(intronic_parts2 %within% reduce(intronic_parts1)))
stopifnot(identical(
    lengths(mcols(intronic_parts1)$gene_id) == 1L,
    intronic_parts1 %within% intronic_parts2
))

## ---------------------------------------------------------------------
## Helper functions
## ---------------------------------------------------------------------

tidyTranscripts(txdb)                      # Ordered by 'tx_id'.
tidyTranscripts(txdb, drop.geneless=TRUE)  # Ordered first by 'gene_id',
                                           # then by 'tx_id'.

tidyExons(txdb)                            # Ordered first by 'tx_id',
                                           # then by 'exon_rank'.
tidyExons(txdb, drop.geneless=TRUE)        # Ordered first by 'gene_id',
                                           # then by 'tx_id',
                                           # then by 'exon_rank'.

tidyIntrons(txdb)                          # Ordered by 'tx_id'.
tidyIntrons(txdb, drop.geneless=TRUE)      # Ordered first by 'gene_id',
                                           # then by 'tx_id'.

GenomicFeatures

Conveniently import and query gene models

v1.42.3
Artistic-2.0
Authors
M. Carlson, H. Pagès, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar, M. Lawrence, V. Obenchain
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.