Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

makeTxDbFromUCSC

Make a TxDb object from annotations available at the UCSC Genome Browser


Description

The makeTxDbFromUCSC function allows the user to make a TxDb object from transcript annotations available at the UCSC Genome Browser.

Note that it uses the RMariaDB package internally so make sure that this package is installed.

Usage

makeTxDbFromUCSC(genome="hg19", tablename="knownGene",
        transcript_ids=NULL,
        circ_seqs=NULL,
        url="http://genome.ucsc.edu/cgi-bin/",
        goldenPath.url=getOption("UCSC.goldenPath.url"),
        taxonomyId=NA,
        miRBaseBuild=NA)

supportedUCSCtables(genome="hg19", url="http://genome.ucsc.edu/cgi-bin/")

browseUCSCtrack(genome="hg19", tablename="knownGene",
                url="http://genome.ucsc.edu/cgi-bin/")

Arguments

genome

The name of a UCSC genome assembly e.g. "hg19" or "panTro6". You can use rtracklayer::ucscGenomes()[ , "db"] to obtain the current list of valid UCSC genome assemblies.

tablename

The name of the UCSC table containing the transcript genomic locations to retrieve. Use the supportedUCSCtables utility function to get the list of tables known to work with makeTxDbFromUCSC.

transcript_ids

Optionally, only retrieve transcript locations for the specified set of transcript ids. If this is used, then the meta information displayed for the resulting TxDb object will say 'Full dataset: no'. Otherwise it will say 'Full dataset: yes'.

circ_seqs

Like GRanges objects, SummarizedExperiment objects, and many other objects in Bioconductor, the TxDb object returned by makeTxDbFromUCSC contains a seqinfo component that can be accessed with seqinfo(). This component contains various sequence-level information like the sequence names, lengths, and circularity flag for the genome assembly of the TxDb object.

As far as we know the information of which sequences are circular is not available in the UCSC Genome Browser. However, for the most commonly used UCSC genome assemblies makeTxDbFromUCSC will get this information from a knowledge database stored in the GenomeInfoDb package (see ?registered_UCSC_genomes).

For less commonly used UCSC genome assemblies, makeTxDbFromUCSC will make a guess based on the chromosome names (e.g. chrM or 2micron will be assumed to be circular). Even though this works most of the time, it is not guaranteed to work all the time. So in this case a warning is issued. If you think the guess is incorrect then you can supply your own list of circular sequences (as a character vector) via the circ_seqs argument.

url,goldenPath.url

Use to specify the location of an alternate UCSC Genome Browser.

taxonomyId

By default this value is NA and the organism inferred will be used to look up the correct value for this. But you can use this argument to supply your own valid taxId here.

miRBaseBuild

Specify the string for the appropriate build information from mirbase.db to use for microRNAs. This can be learned by calling supportedMiRBaseBuildValues. By default, this value will be set to NA, which will inactivate the microRNAs accessor.

Details

makeTxDbFromUCSC is a convenience function that feeds data from the UCSC source to the lower level makeTxDb function. See ?makeTxDbFromEnsembl for a similar function that feeds data from an Ensembl database.

Value

For makeTxDbFromUCSC: A TxDb object.

For supportedUCSCtables: A data frame with 3 columns (tablename, track, and subtrack) and 1 row per table known to work with makeTxDbFromUCSC. IMPORTANT NOTE: In the returned data frame, the set of tables associated with a track with subtracks might contain tables that don't exist for the specified genome.

Author(s)

M. Carlson and H. Pagès

See Also

Examples

## ---------------------------------------------------------------------
## A. BASIC USAGE
## ---------------------------------------------------------------------

## Use ucscGenomes() from the rtracklayer package to display the list of
## genomes available at UCSC:
library(rtracklayer)
ucscGenomes()[ , "db"]

## Display the list of tables known to work with makeTxDbFromUCSC():
supportedUCSCtables("hg38")
supportedUCSCtables("hg19")

## Open the UCSC track page for a given organism/table:
browseUCSCtrack("hg38", tablename="knownGene")
browseUCSCtrack("hg19", tablename="knownGene")

browseUCSCtrack("hg38", tablename="ncbiRefSeqSelect")
browseUCSCtrack("hg19", tablename="ncbiRefSeqSelect")

browseUCSCtrack("hg19", tablename="pseudoYale60")

browseUCSCtrack("sacCer3", tablename="ensGene")

## Retrieve a full transcript dataset for Yeast from UCSC:
txdb1 <- makeTxDbFromUCSC("sacCer3", tablename="ensGene")
txdb1

## Retrieve an incomplete transcript dataset for Mouse from UCSC (only
## transcripts linked to Entrez Gene ID 22290):
transcript_ids <- c(
    "uc009uzf.1",
    "uc009uzg.1",
    "uc009uzh.1",
    "uc009uzi.1",
    "uc009uzj.1"
)

txdb2 <- makeTxDbFromUCSC("mm10", tablename="knownGene",
                          transcript_ids=transcript_ids)
txdb2

## ---------------------------------------------------------------------
## B. IMPORTANT NOTE ABOUT supportedUCSCtables()
## ---------------------------------------------------------------------

## In the data frame returned by supportedUCSCtables(), the set of
## tables associated with a track with subtracks might contain tables
## that don't exist for the specified genome:
supportedUCSCtables("mm10")
browseUCSCtrack("mm10", tablename="ncbiRefSeqSelect")  # no such table

GenomicFeatures

Conveniently import and query gene models

v1.42.3
Artistic-2.0
Authors
M. Carlson, H. Pagès, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar, M. Lawrence, V. Obenchain
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.