Convert Gene Aliases to Official Gene Symbols
Maps gene alias names to official gene symbols.
alias2Symbol(alias, species = "Hs", expand.symbols = FALSE) alias2SymbolTable(alias, species = "Hs") alias2SymbolUsingNCBI(alias, gene.info.file, required.columns = c("GeneID","Symbol","description"))
alias |
character vector of gene aliases |
species |
character string specifying the species.
Possible values include |
expand.symbols |
logical.
This affects those elements of |
gene.info.file |
either the name of a gene information file downloaded from the NCBI or a data.frame resulting from reading such a file. |
required.columns |
character vector of columns from the gene information file that are required in the output. |
Aliases are mapped via NCBI Entrez Gene identity numbers using Bioconductor organism packages.
alias2Symbol
maps a set of aliases to a set of symbols, without necessarily preserving order.
The output vector may be longer or shorter than the original vector, because some aliases might not be found and some aliases may map to more than one symbol.
alias2SymbolTable
returns of vector of the same length as the vector of aliases.
If an alias maps to more than one symbol, then the one with the lowest Entrez ID number is returned.
If an alias can't be mapped, then NA
is returned.
species
can be any character string XX for which an organism package org.XX.eg.db exists and is installed.
The only requirement of the organism package is that it contains objects org.XX.egALIAS2EG
and org.XX.egSYMBOL
linking the aliases and symbols to Entrez Gene Ids.
At the time of writing, the following organism packages are available from Bioconductor 3.6:
Package | Species | |
org.Ag.eg.db | Anopheles | |
org.Bt.eg.db | Bovine | |
org.Ce.eg.db | Worm | |
org.Cf.eg.db | Canine | |
org.Dm.eg.db | Fly | |
org.Dr.eg.db | Zebrafish | |
org.EcK12.eg.db | E coli strain K12 | |
org.EcSakai.eg.db | E coli strain Sakai | |
org.Gg.eg.db | Chicken | |
org.Hs.eg.db | Human | |
org.Mm.eg.db | Mouse | |
org.Mmu.eg.db | Rhesus | |
org.Pt.eg.db | Chimp | |
org.Rn.eg.db | Rat | |
org.Ss.eg.db | Pig | |
org.Xl.eg.db | Xenopus |
alias2SymbolUsingNCBI
is analogous to alias2SymbolTable
but uses a gene-info file from NCBI instead of a Bioconductor organism package.
It also gives the option of returning multiple columns from the gene-info file.
NCBI gene-info files can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO.
For example, the human file is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz and the mouse file is ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz.
alias2Symbol
and alias2SymbolTable
produce a character vector of gene symbols.
alias2SymbolTable
returns a vector of the same length and order as alias
, including NA
values where no gene symbol was found.
alias2Symbol
returns an unordered vector that may be longer or shorter than alias
.
alias2SymbolUsingNCBI
returns a data.frame with rows corresponding to the entries of alias
and columns as specified by required.columns
.
Gordon Smyth and Yifang Hu
This function is often used to assist gene set testing, see 10.GeneSetTests.
alias2Symbol(c("PUMA","NOXA","BIM"), species="Hs") alias2Symbol("RS1", expand=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.