ensembldb: EnsDb-AnnotationDbi – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

EnsDb-AnnotationDbi

Integration into the AnnotationDbi framework

Description

Several of the methods available for AnnotationDbi objects are also implemented for EnsDb objects. This enables to extract data from EnsDb objects in a similar fashion than from objects inheriting from the base annotation package class AnnotationDbi. In addition to the standard usage, the select and mapIds for EnsDb objects support also the filter framework of the ensembdb package and thus allow to perform more fine-grained queries to retrieve data.

Usage

## S4 method for signature 'EnsDb'
columns(x)
## S4 method for signature 'EnsDb'
keys(x, keytype, filter,...)
## S4 method for signature 'EnsDb'
keytypes(x)
## S4 method for signature 'EnsDb'
mapIds(x, keys, column, keytype, ..., multiVals)
## S4 method for signature 'EnsDb'
select(x, keys, columns, keytype, ...)

Arguments

(In alphabetic order)

`column`	For `mapIds`: the column to search on, i.e. from which values should be retrieved.
`columns`	For `select`: the columns from which values should be retrieved. Use the `columns` method to list all possible columns.
`keys`	The keys/ids for which data should be retrieved from the database. This can be either a character vector of keys/IDs, a single filter object extending `AnnotationFilter`, an combination of filters `AnnotationFilterList` or a `formula` representing a filter expression (see `AnnotationFilter` for more details).
`keytype`	For `mapIds` and `select`: the type (column) that matches the provided keys. This argument does not have to be specified if argument `keys` is a filter object extending `AnnotationFilter` or a `list` of such objects. For `keys`: which keys should be returned from the database.
`filter`	For `keys`: either a single object extending `AnnotationFilter` or a list of such object to retrieve only specific keys from the database.
`multiVals`	What should `mapIds` do when there are multiple values that could be returned? Options are: `"first"` (default), `"list"`, `"filter"`, `"asNA"`. See `mapIds` in the `AnnotationDbi` package for a detailed description.
`x`	The `EnsDb` object.
`...`	Not used.

Value

See method description above.

Methods and Functions

columns

List all the columns that can be retrieved by the mapIds and select methods. Note that these column names are different from the ones supported by the genes, transcripts etc. methods that can be listed by the listColumns method.

Returns a character vector of supported column names.

keys

Retrieves all keys from the column name specified with keytype. By default (if keytype is not provided) it returns all gene IDs. Note that keytype="TXNAME" will return transcript ids, since no transcript names are available in the database.

Returns a character vector of IDs.

keytypes

List all supported key types (column names).

Returns a character vector of key types.

mapIds

Retrieve the mapped ids for a set of keys that are of a particular keytype. Argument keys can be either a character vector of keys/IDs, a single filter object extending AnnotationFilter or a list of such objects. For the latter, the argument keytype does not have to be specified. Importantly however, if the filtering system is used, the ordering of the results might not represent the ordering of the keys.

The method usually returns a named character vector or, depending on the argument multiVals a named list, with names corresponding to the keys (same ordering is only guaranteed if keys is a character vector).

select

Retrieve the data as a data.frame based on parameters for selected keys, columns and keytype arguments. Multiple matches of the keys are returned in one row for each possible match. Argument keys can be either a character vector of keys/IDs, a single filter object extending AnnotationFilter or a list of such objects. For the latter, the argument keytype does not have to be specified.

Note that values from a column "TXNAME" will be the same than for a column "TXID", since internally no database column "tx_name" is present and the column is thus mapped to "tx_id".

Returns a data.frame with the column names corresponding to the argument columns and rows with all data matching the criteria specified with keys.

The use of select without filters or keys and without restricting to specicic columns is strongly discouraged, as the SQL query to join all of the tables, especially if protein annotation data is available is very expensive.

Author(s)

Johannes Rainer

Examples

library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86

## List all supported keytypes.
keytypes(edb)

## List all supported columns for the select and mapIds methods.
columns(edb)

## List /real/ database column names.
listColumns(edb)

## Retrieve all keys corresponding to transcript ids.
txids <- keys(edb, keytype = "TXID")
length(txids)
head(txids)

## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
gids <- keys(edb, keytype = "GENENAME", filter = SeqNameFilter("X"))
length(gids)
head(gids)

## Get a mapping of the genes BCL2 and BCL2L11 to all of their
## transcript ids and return the result as list
maps <- mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID",
               keytype = "GENENAME", multiVals = "list")
maps

## Perform the same query using a combination of a GeneNameFilter and a
## TxBiotypeFilter to just retrieve protein coding transcripts for these
## two genes.
mapIds(edb, keys = list(GeneNameFilter(c("BCL2", "BCL2L11")),
                        TxBiotypeFilter("protein_coding")), column = "TXID",
       multiVals = "list")

## select:
## Retrieve all transcript and gene related information for the above example.
select(edb, keys = list(GeneNameFilter(c("BCL2", "BCL2L11")),
                        TxBiotypeFilter("protein_coding")),
       columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART",
                   "TXSEQEND", "SEQNAME", "SEQSTRAND"))

## Get all data for genes encoded on chromosome Y
Y <- select(edb, keys = "Y", keytype = "SEQNAME")
head(Y)
nrow(Y)

## Get selected columns for all lincRNAs encoded on chromosome Y. Here we use
## a filter expression to define what data to retrieve.
Y <- select(edb, keys = ~ seq_name == "Y" & gene_biotype == "lincRNA",
            columns = c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
head(Y)
nrow(Y)

ensembldb

Utilities to create and use Ensembl-based annotation databases

v2.14.1

LGPL

Authors

Johannes Rainer <johannes.rainer@eurac.edu> with contributions from Tim Triche, Sebastian Gibb, Laurent Gatto and Christian Weichenberger.

Initial release