Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dia.bactgensize

Distribution of bacterial genome size from GOLD


Description

This function tries to download the last update of the GOLD (Genomes OnLine Database) to extract bacterial genomes sizes when available. The histogram and the default density() output is produced. Optionally, a maximum likelihood estimate of a superposition of two or three normal distributions is also represented.

Usage

dia.bactgensize(fit = 2, p = 0.5, m1 = 2000, sd1 = 600, m2 = 4500,
       sd2 = 1000, p3 = 0.05, m3 = 9000, sd3 = 1000, maxgensize = 20000,
       source = c("https://pbil.univ-lyon1.fr/datasets/seqinr/data/goldtable15Dec07.txt"))

Arguments

fit

integer value. If fit == O no normal fit is produced, if fit == 2 try to fit a superposition of two normal distributions, if fit == 3 try to fit a superposition of three normal distributions.

p

initial guess for the proportion of the first population.

m1

initial guess for the mean of the first population.

sd1

initial guess for the standard deviation of the first population.

m2

initial guess for the mean of the second population.

sd2

initial guess for the standard deviation of the second population.

p3

initial guess for the proportion of the third population.

m3

initial guess for the mean of the third population.

sd3

initial guess for the standard deviation of the third population.

maxgensize

maximum admissive value in bp for a bacterial genome size: only value less or equal to this threshold are considrered.

source

the file with raw data. By default a local (outdated) copy is used.

Value

An invisible dataframe with three components:

genus

genus name

species

species names

gs

genome size in Kb

Author(s)

J.R. Lobry

References

Please cite the following references when using data from GOLD:

Kyrpides, N.C. (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics, 15:773-774.

Bernal, A., Ear, U., Kyrpides, N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Research, 29:126-127.

Liolios, K., Tavernarakis, N., Hugenholtz, P., Kyrpides, N.C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Research, 34:D332-D334.

Liolios, K., Mavrommatis, K., Tavernarakis, N., Kyrpides, N.C. (2008) The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research, in press:D000-D000.

citation("seqinr")

See Also

Examples

## Not run: # Need internet connection
#
# With a local outdated copy from GOLD:
#
   dia.bactgensize()
#
# With last GOLD data:
#
  # The URL is no more accessible.
  # dia.bactgensize(source = "http://www.genomesonline.org/DBs/goldtable.txt")
  
## End(Not run)

seqinr

Biological Sequences Retrieval and Analysis

v4.2-16
GPL (>= 2)
Authors
Delphine Charif [aut], Olivier Clerc [ctb], Carolin Frank [ctb], Jean R. Lobry [aut, cph], Anamaria Necşulea [ctb], Leonor Palmeira [ctb], Simon Penel [cre], Guy Perrière [ctb]
Initial release
2022-05-19

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.