Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

gl.nhybrids

Create an input file for the program NewHybrids and run it if NewHybrids is installed


Description

This function compares two sets of parental populations to identify loci that exhibit a fixed difference, returns an genlight object with the reduced data, and creates an input file for the program NewHybrids using the top 200 (or hard specified loc.limit) loci. In the absence of two identified parental populations, the script will select a random set 200 loci only (method="random") or the first 200 loci ranked on information content (method="AvgPIC").

Usage

gl.nhybrids(
  gl,
  outfile = "nhyb.txt",
  outpath = tempdir(),
  p0 = NULL,
  p1 = NULL,
  threshold = 0,
  method = "random",
  plot = TRUE,
  pprob = 0.95,
  nhyb.directory = NULL,
  BurnIn = 10000,
  sweeps = 10000,
  GtypFile = "TwoGensGtypFreq.txt",
  AFPriorFile = NULL,
  PiPrior = "Jeffreys",
  ThetaPrior = "Jeffreys",
  verbose = NULL
)

Arguments

gl

– name of the genlight object containing the SNP data [required]

outfile

– name of the file that will be the input file for NewHybrids [default nhyb.txt]

outpath

– path where to save the output file (set to tempdir by default)

p0

– list of populations to be regarded as parental population 0 [default NULL]

p1

– list of populations to be regarded as parental population 1 [default NULL]

threshold

– sets the level at which a gene frequency difference is considered to be fixed [default 0]

method

– specifies the method (random or AvgPIC) to select 200 loci for NewHybrids [default random]

plot

– if TRUE, a plot of the frequency of homozygous reference, heterozygotes and homozygous alternate (SNP) is produced for the F1 individuals [default TRUE, applies only if both parental populations are specified]

pprob

– threshold level for assignment to likelihood bins [default 0.95, used only if plot=TRUE]

nhyb.directory

– directory that holds the NewHybrids executable file e.g. C:/NewHybsPC [default NULL]

BurnIn

– number of sweeps to use in the burn in [default 10000]

sweeps

– number of sweeps to use in computing the actual Monte Carlo averages [default 10000]

GtypFile

– name of a file containing the genotype frequency classes [default TwoGensGtypFreq.txt]

AFPriorFile

– name of the file containing prior allele frequency information [default NULL]

PiPrior

– Jeffreys-like priors or Uniform priors for the parameter pi [default Jeffreys]

ThetaPrior

– Jeffreys-like priors or Uniform priors for the parameter theta [default Jeffreys]

verbose

– verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]

Details

A fixed difference occurs when a SNP allele is present in all individuals of one population and absent in the other. There is provision for setting a level of tollerance, e.g. threshold = 0.05 which considers alleles present at greater than 95 a fixed difference. Only the 200 loci are retained, because of limitations of NewHybids.

If you specify a directory for the NewHybrids executable file, then the script will create the input file from the snp data then run NewHybrids. If the directory is set to NULL, the exectution will stop once the input file (default="nhyb.txt") has been written to disk.

Refer to the New Hybrids manual for further information on the parameters to set – http://ib.berkeley.edu/labs/slatkin/eriq/software/new_hybs_doc1_1Beta3.pdf

It is important to stringently filter the data on RepAvg and CallRate if using the random option. One might elect to repeat the analysis (method="random") and combine the resultant posterior probabilites should 200 loci be considered insufficient.

The F1 individuals should be homozygous at all loci for which the parental populations are fixed and different, assuming parental populations have been specified. Sampling errors can result in this not being the case, especially where the sample sizes for the parental populations are small. Alternatively, the threshold for posterior probabilities used to determine assignment (pprob) or the definition of a fixed difference (threshold) may be too lax. To assess the error rate in the determination of assignment of F1 individuals, a plot of the frequency of homozygous reference, heterozygotes and homozygous alternate (SNP) can be produced by setting plot=TRUE (the default).

Value

The reduced genlight object, if parentals are provided; output of NewHybrids is saved to disk

Author(s)

Arthur Georges (Post to https://groups.google.com/d/forum/dartr)

Examples

## Not run: 
m <- gl.nhybrids(testset.gl, outfile="nhyb.txt", 
p0=NULL, p1=NULL, 
nhyb.directory="C:/workspace/R/NewHybsPC",
BurnIn=100,
sweeps=100,
verbose=3)

## End(Not run)

dartR

Importing and Analysing SNP and Silicodart Data Generated by Genome-Wide Restriction Fragment Analysis

v1.9.6
GPL-2
Authors
Bernd Gruber [aut, cre], Arthur Georges [aut], Jose L. Mijangos [aut], Peter J. Unmack [ctb], Oliver Berry [ctb], Lindsay V. Clark [ctb], Floriaan Devloo-Delva [ctb]
Initial release
2021-04-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.