GenomicFeatures: transcripts – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

transcripts

Extract genomic features from a TxDb-like object

Description

Generic functions to extract genomic features from a TxDb-like object. This page documents the methods for TxDb objects only.

Usage

transcripts(x, ...)
## S4 method for signature 'TxDb'
transcripts(x, columns=c("tx_id", "tx_name"), filter=NULL, use.names=FALSE)

exons(x, ...)
## S4 method for signature 'TxDb'
exons(x, columns="exon_id", filter=NULL, use.names=FALSE)

cds(x, ...)
## S4 method for signature 'TxDb'
cds(x, columns="cds_id", filter=NULL, use.names=FALSE)

genes(x, ...)
## S4 method for signature 'TxDb'
genes(x, columns="gene_id", filter=NULL, single.strand.genes.only=TRUE)

## S4 method for signature 'TxDb'
promoters(x, upstream=2000, downstream=200, use.names=TRUE, ...)

Arguments

`x`	A TxDb object.
`...`	For the `transcripts`, `exons`, `cds`, and `genes` generic functions: arguments to be passed to methods. For the `promoters` method for TxDb objects: arguments to be passed to the internal call to `transcripts`.
`columns`	Columns to include in the output. Must be `NULL` or a character vector as given by the `columns` method. With the following restrictions: `"TXCHROM"` and `"TXSTRAND"` are not allowed for `transcripts`. `"EXONCHROM"` and `"EXONSTRAND"` are not allowed for `exons`. `"CDSCHROM"` and `"CDSSTRAND"` are not allowed for `cds`. If the vector is named, those names are used for the corresponding column in the element metadata of the returned object.
`filter`	Either `NULL` or a named list of vectors to be used to restrict the output. Valid names for this list are: `"gene_id"`, `"tx_id"`, `"tx_name"`, `"tx_chrom"`, `"tx_strand"`, `"exon_id"`, `"exon_name"`, `"exon_chrom"`, `"exon_strand"`, `"cds_id"`, `"cds_name"`, `"cds_chrom"`, `"cds_strand"` and `"exon_rank"`.
`use.names`	`TRUE` or `FALSE`. If `TRUE`, the feature names are set as the names of the returned object, with NAs being replaced with empty strings.
`single.strand.genes.only`	`TRUE` or `FALSE`. If `TRUE` (the default), then genes are returned in a GRanges object and those genes that cannot be represented by a single genomic range (because they have exons located on both strands of the same reference sequence or on more than one reference sequence) are dropped with a message. If `FALSE`, then all the genes are returned in a GRangesList object with the columns specified thru the `columns` argument set as top level metadata columns. (Please keep in mind that the top level metadata columns of a GRangesList object are not displayed by the `show()` method.)
`upstream`	For `promoters` : An `integer(1)` value indicating the number of bases upstream from the transcription start site. For additional details see ?`promoters,GRanges-method`.
`downstream`	For `promoters` : An `integer(1)` value indicating the number of bases downstream from the transcription start site. For additional details see ?`promoters,GRanges-method`.

Details

These are the main functions for extracting transcript information from a TxDb-like object. These methods can restrict the output based on categorical information. To restrict the output based on interval information, use the transcriptsByOverlaps, exonsByOverlaps, and cdsByOverlaps functions.

The promoters function computes user-defined promoter regions for the transcripts in a TxDb-like object. The return object is a GRanges of promoter regions around the transcription start site the span of which is defined by upstream and downstream. For additional details on how the promoter range is computed and the handling of + and - strands see ?`promoters,GRanges-method`.

Value

A GRanges object. The only exception being when genes is used with single.strand.genes.only=FALSE, in which case a GRangesList object is returned.

Author(s)

M. Carlson, P. Aboyoun and H. Pagès

Examples

txdb_file <- system.file("extdata", "hg19_knownGene_sample.sqlite",
                         package="GenomicFeatures")
txdb <- loadDb(txdb_file)

## ---------------------------------------------------------------------
## transcripts()
## ---------------------------------------------------------------------

tx1 <- transcripts(txdb)
tx1

transcripts(txdb, use.names=TRUE)
transcripts(txdb, columns=NULL, use.names=TRUE)

filter <- list(tx_chrom = c("chr3", "chr5"), tx_strand = "+")
tx2 <- transcripts(txdb, filter=filter)
tx2

## Sanity checks:
stopifnot(
  identical(mcols(tx1)$tx_id, seq_along(tx1)),
  identical(tx2, tx1[seqnames(tx1) == "chr3" & strand(tx1) == "+"])
)

## ---------------------------------------------------------------------
## exons()
## ---------------------------------------------------------------------

exons(txdb, columns=c("EXONID", "TXNAME"),
            filter=list(exon_id=1))
exons(txdb, columns=c("EXONID", "TXNAME"),
            filter=list(tx_name="uc009vip.1"))

## ---------------------------------------------------------------------
## genes()
## ---------------------------------------------------------------------

genes(txdb)  # a GRanges object
cols <- c("tx_id", "tx_chrom", "tx_strand",
          "exon_id", "exon_chrom", "exon_strand")
## By default, genes are returned in a GRanges object and those that
## cannot be represented by a single genomic range (because they have
## exons located on both strands of the same reference sequence or on
## more than one reference sequence) are dropped with a message:
single_strand_genes <- genes(txdb, columns=cols)

## Because we've returned single strand genes only, the "tx_chrom"
## and "exon_chrom" metadata columns are guaranteed to match
## 'seqnames(single_strand_genes)':
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_chrom)))
stopifnot(identical(as.character(seqnames(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_chrom)))

## and also the "tx_strand" and "exon_strand" metadata columns are
## guaranteed to match 'strand(single_strand_genes)':
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$tx_strand)))
stopifnot(identical(as.character(strand(single_strand_genes)),
                    as.character(mcols(single_strand_genes)$exon_strand)))

all_genes <- genes(txdb, columns=cols, single.strand.genes.only=FALSE)
all_genes  # a GRangesList object
multiple_strand_genes <- all_genes[elementNROWS(all_genes) >= 2]
multiple_strand_genes
mcols(multiple_strand_genes)

## ---------------------------------------------------------------------
## promoters()
## ---------------------------------------------------------------------

## This:
promoters(txdb, upstream=100, downstream=50)
## is equivalent to:
promoters(transcripts(txdb, use.names=TRUE), upstream=100, downstream=50)

## Extra arguments are passed to transcripts(). So this:
columns <- c("tx_name", "gene_id")
promoters(txdb, upstream=100, downstream=50, columns=columns)
## is equivalent to:
promoters(transcripts(txdb, columns=columns, use.names=TRUE),
          upstream=100, downstream=50)

GenomicFeatures

Conveniently import and query gene models

v1.42.3

Artistic-2.0

Authors

M. Carlson, H. Pagès, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar, M. Lawrence, V. Obenchain

Initial release