Making a TxDb object from user supplied annotations
makeTxDb
is a low-level constructor for making
a TxDb object from user supplied transcript annotations.
Note that the end user will rarely need to use makeTxDb
directly
but will typically use one of the high-level constructors
makeTxDbFromUCSC
, makeTxDbFromEnsembl
,
or makeTxDbFromGFF
.
makeTxDb(transcripts, splicings, genes=NULL, chrominfo=NULL, metadata=NULL, reassign.ids=FALSE, on.foreign.transcripts=c("error", "drop"))
transcripts |
Data frame containing the genomic locations of a set of transcripts. |
splicings |
Data frame containing the exon and CDS locations of a set of transcripts. |
genes |
Data frame containing the genes associated to a set of transcripts. |
chrominfo |
Data frame containing information about the chromosomes hosting the set of transcripts. |
metadata |
2-column data frame containing meta information about this set of
transcripts like organism, genome, UCSC table, etc...
The names of the columns must be |
reassign.ids |
|
on.foreign.transcripts |
Controls what to do when the input contains foreign transcripts
i.e. transcripts that are on sequences not in |
The transcripts
(required), splicings
(required)
and genes
(optional) arguments must be data frames that
describe a set of transcripts and the genomic features related
to them (exons, CDS and genes at the moment).
The chrominfo
(optional) argument must be a data frame
containing chromosome information like the length of each chromosome.
transcripts
must have 1 row per transcript and the following
columns:
tx_id
: Transcript ID. Integer vector. No NAs. No duplicates.
tx_chrom
: Transcript chromosome. Character vector (or factor)
with no NAs.
tx_strand
: Transcript strand. Character vector (or factor)
with no NAs where each element is either "+"
or "-"
.
tx_start
, tx_end
: Transcript start and end.
Integer vectors with no NAs.
tx_name
: [optional] Transcript name. Character vector (or
factor). NAs and/or duplicates are ok.
tx_type
: [optional] Transcript type (e.g. mRNA, ncRNA, snoRNA,
etc...). Character vector (or factor). NAs and/or duplicates are ok.
gene_id
: [optional] Associated gene. Character vector (or
factor). NAs and/or duplicates are ok.
Other columns, if any, are ignored (with a warning).
splicings
must have N rows per transcript, where N is the nb
of exons in the transcript. Each row describes an exon plus, optionally,
the CDS contained in this exon. Its columns must be:
tx_id
: Foreign key that links each row in the splicings
data frame to a unique row in the transcripts
data frame.
Note that more than 1 row in splicings
can be linked to the
same row in transcripts
(many-to-one relationship).
Same type as transcripts$tx_id
(integer vector). No NAs.
All the values in this column must be present in
transcripts$tx_id
.
exon_rank
: The rank of the exon in the transcript.
Integer vector with no NAs. (tx_id
, exon_rank
)
pairs must be unique.
exon_id
: [optional] Exon ID.
Integer vector with no NAs.
exon_name
: [optional] Exon name. Character vector (or factor).
NAs and/or duplicates are ok.
exon_chrom
: [optional] Exon chromosome.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_chrom
is used.
If present then exon_strand
must also be present.
exon_strand
: [optional] Exon strand.
Character vector (or factor) with no NAs.
If missing then transcripts$tx_strand
is used
and exon_chrom
must also be missing.
exon_start
, exon_end
: Exon start and end.
Integer vectors with no NAs.
cds_id
: [optional] CDS ID. Integer vector.
If present then cds_start
and cds_end
must also
be present.
NAs are allowed and must match those in cds_start
and
cds_end
.
cds_name
: [optional] CDS name. Character vector (or factor).
If present then cds_start
and cds_end
must also be
present. NAs and/or duplicates are ok. Must contain NAs at least
where cds_start
and cds_end
contain them.
cds_start
, cds_end
: [optional] CDS start and end.
Integer vectors.
If one of the 2 columns is missing then all cds_*
columns
must be missing.
NAs are allowed and must occur at the same positions in
cds_start
and cds_end
.
cds_phase
: [optional] CDS phase. Integer vector.
If present then cds_start
and cds_end
must also
be present.
NAs are allowed and must match those in cds_start
and
cds_end
.
Other columns, if any, are ignored (with a warning).
genes
should not be supplied if transcripts
has a
gene_id
column. If supplied, it must have N rows per transcript,
where N is the nb of genes linked to the transcript (N will be 1 most
of the time). Its columns must be:
tx_id
: [optional] genes
must have either a
tx_id
or a tx_name
column but not both.
Like splicings$tx_id
, this is a foreign key that
links each row in the genes
data frame to a unique
row in the transcripts
data frame.
tx_name
: [optional]
Can be used as an alternative to the genes$tx_id
foreign key.
gene_id
: Gene ID. Character vector (or factor). No NAs.
Other columns, if any, are ignored (with a warning).
chrominfo
must have 1 row per chromosome and the following
columns:
chrom
: Chromosome name.
Character vector (or factor) with no NAs and no duplicates.
length
: Chromosome length.
Integer vector with either all NAs or no NAs.
is_circular
: [optional] Chromosome circularity flag.
Logical vector. NAs are ok.
Other columns, if any, are ignored (with a warning).
A TxDb object.
Hervé Pagès
makeTxDbFromUCSC
, makeTxDbFromBiomart
,
and makeTxDbFromEnsembl
, for making a TxDb
object from online resources.
makeTxDbFromGRanges
and makeTxDbFromGFF
for making a TxDb object from a GRanges
object, or from a GFF or GTF file.
The TxDb class.
saveDb
and
loadDb
in the AnnotationDbi
package for saving and loading a TxDb object as an SQLite
file.
transcripts <- data.frame( tx_id=1:3, tx_chrom="chr1", tx_strand=c("-", "+", "+"), tx_start=c(1, 2001, 2001), tx_end=c(999, 2199, 2199)) splicings <- data.frame( tx_id=c(1L, 2L, 2L, 2L, 3L, 3L), exon_rank=c(1, 1, 2, 3, 1, 2), exon_start=c(1, 2001, 2101, 2131, 2001, 2131), exon_end=c(999, 2085, 2144, 2199, 2085, 2199), cds_start=c(1, 2022, 2101, 2131, NA, NA), cds_end=c(999, 2085, 2144, 2193, NA, NA), cds_phase=c(0, 0, 2, 0, NA, NA)) txdb <- makeTxDb(transcripts, splicings)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.