Report loci containing secondary SNPs in sequence tags
SNP datasets generated by DArT include fragments with more than one SNP (that is, with secondaries) and record them separately with the same CloneID (=AlleleID). These multiple SNP loci within a fragment are likely to be linked, and so you may wish to remove secondaries.
gl.report.secondaries(x, boxplot = "adjusted", range = 1.5, verbose = 2)
x |
– name of the genlight object containing the SNP data [required] |
boxplot |
– if 'standard', plots a standard box and whisker plot; if 'adjusted', plots a boxplot adjusted for skewed distributions [default 'adjusted'] |
range |
– specifies the range for delimiting outliers [default = 1.5 interquartile ranges] |
verbose |
– verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity] |
The script reports statistics associated with secondaries, and the consequences of filtering them out, and provides three plots. The first is a Box and Whisker plot adjusted to account for skewness, the second is a bargraph of the frequency of secondaries per sequence tag, and the third is Poisson expectation for those frequencies including an estimate of the zero class (no. of sequence tags with no SNP scored).
Heterozygosity in gl.report.heterozygosity is in a sense relative, because it is calculated against a background of only those loci that are polymorphic somewhere in the dataset. To allow intercomparability across studies and species, any measure of heterozygosity needs to accommodate loci that are invariant. However, the number of invariant loci are unknown given the SNPs are detected as single point mutational variants and invariant seqeunces are discarded, and because of the particular additional filtering pre-analysis. Modelling the counts of SNPs per sequence tag as a Poisson distribution in this script allows estimate of the zero class, that is, the number of invariant loci. This is reported, and the veracity of the estimate can be assessed by the correspondence of the observed frequencies against those under Poisson expectation in the associated graphs. The number of invariant loci can then be optionally provided to gl.report.heterozygosity via the parameter n.invariants.
returns a genlight object of loci with multiple SNP calls
Arthur Georges (Post to https://groups.google.com/d/forum/dartr)
out <- gl.report.secondaries(bandicoot.gl)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.