Make a TxDb object from annotations available at the UCSC Genome Browser
The makeTxDbFromUCSC
function allows the user to make a
TxDb object from transcript annotations available at the
UCSC Genome Browser.
Note that it uses the RMariaDB package internally so make sure that this package is installed.
makeTxDbFromUCSC(genome="hg19", tablename="knownGene", transcript_ids=NULL, circ_seqs=NULL, url="http://genome.ucsc.edu/cgi-bin/", goldenPath.url=getOption("UCSC.goldenPath.url"), taxonomyId=NA, miRBaseBuild=NA) supportedUCSCtables(genome="hg19", url="http://genome.ucsc.edu/cgi-bin/") browseUCSCtrack(genome="hg19", tablename="knownGene", url="http://genome.ucsc.edu/cgi-bin/")
genome |
The name of a UCSC genome assembly e.g. |
tablename |
The name of the UCSC table containing the transcript genomic locations
to retrieve. Use the |
transcript_ids |
Optionally, only retrieve transcript locations for the specified set of transcript ids. If this is used, then the meta information displayed for the resulting TxDb object will say 'Full dataset: no'. Otherwise it will say 'Full dataset: yes'. |
circ_seqs |
Like GRanges objects,
SummarizedExperiment objects,
and many other objects in Bioconductor, the TxDb object
returned by As far as we know the information of which sequences are circular
is not available in the UCSC Genome Browser. However, for the
most commonly used UCSC genome assemblies For less commonly used UCSC genome assemblies, |
url,goldenPath.url |
Use to specify the location of an alternate UCSC Genome Browser. |
taxonomyId |
By default this value is NA and the organism inferred will be used to look up the correct value for this. But you can use this argument to supply your own valid taxId here. |
miRBaseBuild |
Specify the string for the appropriate build information from
mirbase.db to use for microRNAs. This can be learned by
calling |
makeTxDbFromUCSC
is a convenience function that feeds
data from the UCSC source to the lower level makeTxDb
function.
See ?makeTxDbFromEnsembl
for a similar function that
feeds data from an Ensembl database.
For makeTxDbFromUCSC
: A TxDb object.
For supportedUCSCtables
: A data frame with 3 columns
(tablename
, track
, and subtrack
) and 1 row
per table known to work with makeTxDbFromUCSC
.
IMPORTANT NOTE: In the returned data frame, the set of tables associated
with a track with subtracks might contain tables that don't exist for the
specified genome.
M. Carlson and H. Pagès
makeTxDbFromEnsembl
and
makeTxDbFromBiomart
for making a TxDb
object from other online resources.
makeTxDbFromGRanges
and makeTxDbFromGFF
for making a TxDb object from a GRanges
object, or from a GFF or GTF file.
ucscGenomes
in the rtracklayer
package.
The supportedMiRBaseBuildValues
function for
listing all the possible values for the miRBaseBuild
argument.
The TxDb class.
makeTxDb
for the low-level function used by the
makeTxDbFrom*
functions to make the TxDb object
returned to the user.
## --------------------------------------------------------------------- ## A. BASIC USAGE ## --------------------------------------------------------------------- ## Use ucscGenomes() from the rtracklayer package to display the list of ## genomes available at UCSC: library(rtracklayer) ucscGenomes()[ , "db"] ## Display the list of tables known to work with makeTxDbFromUCSC(): supportedUCSCtables("hg38") supportedUCSCtables("hg19") ## Open the UCSC track page for a given organism/table: browseUCSCtrack("hg38", tablename="knownGene") browseUCSCtrack("hg19", tablename="knownGene") browseUCSCtrack("hg38", tablename="ncbiRefSeqSelect") browseUCSCtrack("hg19", tablename="ncbiRefSeqSelect") browseUCSCtrack("hg19", tablename="pseudoYale60") browseUCSCtrack("sacCer3", tablename="ensGene") ## Retrieve a full transcript dataset for Yeast from UCSC: txdb1 <- makeTxDbFromUCSC("sacCer3", tablename="ensGene") txdb1 ## Retrieve an incomplete transcript dataset for Mouse from UCSC (only ## transcripts linked to Entrez Gene ID 22290): transcript_ids <- c( "uc009uzf.1", "uc009uzg.1", "uc009uzh.1", "uc009uzi.1", "uc009uzj.1" ) txdb2 <- makeTxDbFromUCSC("mm10", tablename="knownGene", transcript_ids=transcript_ids) txdb2 ## --------------------------------------------------------------------- ## B. IMPORTANT NOTE ABOUT supportedUCSCtables() ## --------------------------------------------------------------------- ## In the data frame returned by supportedUCSCtables(), the set of ## tables associated with a track with subtracks might contain tables ## that don't exist for the specified genome: supportedUCSCtables("mm10") browseUCSCtrack("mm10", tablename="ncbiRefSeqSelect") # no such table
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.