Get chromosome information for an NCBI assembly
getChromInfoFromNCBI
returns chromosome information
like sequence names, lengths and circularity flags for a given
NCBI assembly e.g. for GRCh38, ARS-UCD1.2, R64, etc...
Note that getChromInfoFromNCBI
behaves slightly differently
depending on whether the assembly is registered in the
GenomeInfoDb package or not. See below for the details.
Use registered_NCBI_assemblies
to list all the NCBI assemblies
currently registered in the GenomeInfoDb package.
getChromInfoFromNCBI(assembly, assembled.molecules.only=FALSE, assembly.units=NULL, recache=FALSE, as.Seqinfo=FALSE) registered_NCBI_assemblies()
assembly |
A single string specifying the name of an NCBI assembly (e.g.
|
assembled.molecules.only |
If If |
assembly.units |
If
|
recache |
|
as.Seqinfo |
|
registered vs unregistered NCBI assemblies:
All NCBI assemblies can be looked up by assembly accession (GenBank or RefSeq) but only registered assemblies can also be looked up by assembly name.
For registered assemblies, the returned circularity flags are guaranteed to be accurate. For unregistered assemblies, a heuristic is used to determine the circular sequences.
Please contact the maintainer of the GenomeInfoDb package to request registration of additional assemblies.
For getChromInfoFromNCBI
: By default, a 10-column data frame
with columns:
SequenceName
: character.
SequenceRole
: factor.
AssignedMolecule
: factor.
GenBankAccn
: character.
Relationship
: factor.
RefSeqAccn
: character.
AssemblyUnit
: factor.
SequenceLength
: integer. Note that this column **can**
contain NAs! For example this is the case in assembly Amel_HAv3.1
where the length of sequence MT is missing or in assembly
Release 5 where the length of sequence Un is missing.
UCSCStyleName
: character.
circular
: logical.
For registered_NCBI_assemblies
: A data frame summarizing all the
NCBI assemblies currently registered in the GenomeInfoDb
package.
H. Pagès
getChromInfoFromUCSC
for getting chromosome
information for a UCSC genome.
getChromInfoFromEnsembl
for getting chromosome
information for an Ensembl species.
Seqinfo objects.
## Internet access required! getChromInfoFromNCBI("GRCh37") getChromInfoFromNCBI("GRCh37", as.Seqinfo=TRUE) getChromInfoFromNCBI("GRCh37", assembled.molecules.only=TRUE) getChromInfoFromNCBI("TAIR10.1") getChromInfoFromNCBI("TAIR10.1", assembly.units="non-nuclear") ## List of NCBI assemblies currently registered in the package: registered_NCBI_assemblies() ## The GRCh38.p12 assembly only adds "patch sequences" to the GRCh38 ## assembly: GRCh38 <- getChromInfoFromNCBI("GRCh38") table(GRCh38$SequenceRole) GRCh38.p12 <- getChromInfoFromNCBI("GRCh38.p12") table(GRCh38.p12$SequenceRole) # 140 patch sequences (70 fix + 70 novel) ## Sanity checks: idx <- match(GRCh38$SequenceName, GRCh38.p12$SequenceName) stopifnot(!anyNA(idx)) tmp1 <- GRCh38.p12[idx, ] rownames(tmp1) <- NULL tmp2 <- GRCh38.p12[-idx, ] stopifnot( identical(tmp1[ , -(5:7)], GRCh38[ , -(5:7)]), identical(tmp2, GRCh38.p12[GRCh38.p12$AssemblyUnit == "PATCHES", ]) )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.