Extract genomic features from a TxDb-like object
Generic functions to extract genomic features from a TxDb-like object. This page documents the methods for TxDb objects only.
transcripts(x, ...) ## S4 method for signature 'TxDb' transcripts(x, columns=c("tx_id", "tx_name"), filter=NULL, use.names=FALSE) exons(x, ...) ## S4 method for signature 'TxDb' exons(x, columns="exon_id", filter=NULL, use.names=FALSE) cds(x, ...) ## S4 method for signature 'TxDb' cds(x, columns="cds_id", filter=NULL, use.names=FALSE) genes(x, ...) ## S4 method for signature 'TxDb' genes(x, columns="gene_id", filter=NULL, single.strand.genes.only=TRUE) ## S4 method for signature 'TxDb' promoters(x, upstream=2000, downstream=200, use.names=TRUE, ...)
x |
A TxDb object. |
... |
For the For the |
columns |
Columns to include in the output.
Must be
If the vector is named, those names are used for the corresponding column in the element metadata of the returned object. |
filter |
Either |
use.names |
|
single.strand.genes.only |
If |
upstream |
For |
downstream |
For |
These are the main functions for extracting transcript information
from a TxDb-like object. These methods can restrict the output
based on categorical information. To restrict the output based on interval
information, use the transcriptsByOverlaps
,
exonsByOverlaps
, and cdsByOverlaps
functions.
The promoters
function computes user-defined promoter regions
for the transcripts in a TxDb-like object. The return object is
a GRanges
of promoter regions around the transcription start
site the span of which is defined by upstream
and downstream
.
For additional details on how the promoter range is computed and the
handling of +
and -
strands see
?`promoters,GRanges-method`
.
A GRanges object. The only exception being
when genes
is used with single.strand.genes.only=FALSE
,
in which case a GRangesList object is returned.
M. Carlson, P. Aboyoun and H. Pagès
transcriptsBy
and transcriptsByOverlaps
for more ways to extract genomic features
from a TxDb-like object.
transcriptLengths
for extracting the transcript
lengths (and other metrics) from a TxDb object.
exonicParts
and intronicParts
for
extracting non-overlapping exonic or intronic parts from a
TxDb-like object.
extractTranscriptSeqs
for extracting transcript
(or CDS) sequences from reference sequences.
coverageByTranscript
for computing coverage by
transcript (or CDS) of a set of ranges.
select-methods for how to use the simple "select" interface to extract information from a TxDb object.
microRNAs
and tRNAs
for extracting
microRNA or tRNA genomic ranges from a TxDb object.
id2name
for mapping TxDb internal ids
to external names for a given feature type.
The TxDb class.
txdb_file <- system.file("extdata", "hg19_knownGene_sample.sqlite", package="GenomicFeatures") txdb <- loadDb(txdb_file) ## --------------------------------------------------------------------- ## transcripts() ## --------------------------------------------------------------------- tx1 <- transcripts(txdb) tx1 transcripts(txdb, use.names=TRUE) transcripts(txdb, columns=NULL, use.names=TRUE) filter <- list(tx_chrom = c("chr3", "chr5"), tx_strand = "+") tx2 <- transcripts(txdb, filter=filter) tx2 ## Sanity checks: stopifnot( identical(mcols(tx1)$tx_id, seq_along(tx1)), identical(tx2, tx1[seqnames(tx1) == "chr3" & strand(tx1) == "+"]) ) ## --------------------------------------------------------------------- ## exons() ## --------------------------------------------------------------------- exons(txdb, columns=c("EXONID", "TXNAME"), filter=list(exon_id=1)) exons(txdb, columns=c("EXONID", "TXNAME"), filter=list(tx_name="uc009vip.1")) ## --------------------------------------------------------------------- ## genes() ## --------------------------------------------------------------------- genes(txdb) # a GRanges object cols <- c("tx_id", "tx_chrom", "tx_strand", "exon_id", "exon_chrom", "exon_strand") ## By default, genes are returned in a GRanges object and those that ## cannot be represented by a single genomic range (because they have ## exons located on both strands of the same reference sequence or on ## more than one reference sequence) are dropped with a message: single_strand_genes <- genes(txdb, columns=cols) ## Because we've returned single strand genes only, the "tx_chrom" ## and "exon_chrom" metadata columns are guaranteed to match ## 'seqnames(single_strand_genes)': stopifnot(identical(as.character(seqnames(single_strand_genes)), as.character(mcols(single_strand_genes)$tx_chrom))) stopifnot(identical(as.character(seqnames(single_strand_genes)), as.character(mcols(single_strand_genes)$exon_chrom))) ## and also the "tx_strand" and "exon_strand" metadata columns are ## guaranteed to match 'strand(single_strand_genes)': stopifnot(identical(as.character(strand(single_strand_genes)), as.character(mcols(single_strand_genes)$tx_strand))) stopifnot(identical(as.character(strand(single_strand_genes)), as.character(mcols(single_strand_genes)$exon_strand))) all_genes <- genes(txdb, columns=cols, single.strand.genes.only=FALSE) all_genes # a GRangesList object multiple_strand_genes <- all_genes[elementNROWS(all_genes) >= 2] multiple_strand_genes mcols(multiple_strand_genes) ## --------------------------------------------------------------------- ## promoters() ## --------------------------------------------------------------------- ## This: promoters(txdb, upstream=100, downstream=50) ## is equivalent to: promoters(transcripts(txdb, use.names=TRUE), upstream=100, downstream=50) ## Extra arguments are passed to transcripts(). So this: columns <- c("tx_name", "gene_id") promoters(txdb, upstream=100, downstream=50, columns=columns) ## is equivalent to: promoters(transcripts(txdb, columns=columns, use.names=TRUE), upstream=100, downstream=50)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.