Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

util.outflank

OutFLANK: An Fst outlier approach by Mike Whitlock and Katie Lotterhos, University of British Columbia.


Description

This function is the original implementation of Outflank by Whitlock and Lotterhos. dartR simply provides a convinient wrapper around their functions and an easier install being an r package (for information please refer to their github repository) This method looks for Fst outliers from a list of Fst's for different loci. It assumes that each locus has been genotyped in all populations with approximately equal coverage. OutFLANK estimates the distribution of Fst based on a trimmed sample of Fst's. It assumes that the majority of loci in the center of the distribution are neutral and infers the shape of the distribution of neutral Fst using a trimmed set of loci. Loci with the highest and lowest Fst's are trimmed from the data set before this inference, and the distribution of Fst df/(mean Fst) is assumed to follow a chi-square distribution. Based on this inferred distribution, each locus is given a q-value based on its quantile in the inferred null distribution. The main procedure is called OutFLANK – see comments in that function immediately below for input and output formats. The other functions here are necessary and must be uploaded, but are not necessarily needed by the user directly. Steps:

Usage

util.outflank(
  FstDataFrame,
  LeftTrimFraction = 0.05,
  RightTrimFraction = 0.05,
  Hmin = 0.1,
  NumberOfSamples,
  qthreshold = 0.05
)

Arguments

FstDataFrame

A data frame that includes a row for each locus, with columns as follows:

  • $LocusName: a character string that uniquely names each locus.

  • $FST: Fst calculated for this locus. (Kept here to report the unbased Fst of the results)

  • $T1: The numerator of the estimator for Fst (necessary, with $T2, to calculate mean Fst)

  • $T2: The denominator of the estimator of Fst

  • $FSTNoCorr: Fst calculated for this locus without sample size correction. (Used to find outliers)

  • $T1NoCorr: The numerator of the estimator for Fst without sample size correction (necessary, with $T2, to calculate mean Fst)

  • $T2NoCorr: The denominator of the estimator of Fst without sample size correction

  • $He: The heterozygosity of the locus (used to screen out low heterozygosity loci that have a different distribution)

LeftTrimFraction

The proportion of loci that are trimmed from the lower end of the range of Fst before the likelihood funciton is applied.

RightTrimFraction

The proportion of loci that are trimmed from the upper end of the range of Fst before the likelihood funciton is applied.

Hmin

The minimum heterozygosity required before including calculations from a locus.

NumberOfSamples

The number of spatial locations included in the data set.

qthreshold

The desired false discovery rate threshold for calculating q-values.

Value

The function returns a list with seven elements:

  • FSTbar: the mean FST inferred from loci not marked as outliers

  • FSTNoCorrbar: the mean FST (not corrected for sample size -gives an upwardly biased estimate of FST)

  • dfInferred: the inferred number of degrees of freedom for the chi-square distribution of neutral FST

  • numberLowFstOutliers: Number of loci flagged as having a signficantly low FST (not reliable)

  • numberHighFstOutliers: Number of loci identified as haivng significantly high FST

  • results: a data frame with a row for each locus. This data frame includes all the original columns in the data set, and six new ones:

    • $indexOrder (the original order of the input data set),

    • $GoodH (Boolean variable which is TRUE if the expected heterozygosity is greater than the Hemin set by input),

    • $OutlierFlag (TRUE if the method identifies the locus as an outlier, FALSE otherwise), and

    • $q (the q-value for the test of neutrality for the locus)

    • $pvalues (the p-value for the test of neutrality for the locus)

    • $pvaluesRightTail the one-sided (right tail) p-value for a locus

Author(s)

Bernd Gruber (glbugs@aerg.canberra.edu.au); original implementation of Whitlock & Lotterhos


dartR

Importing and Analysing SNP and Silicodart Data Generated by Genome-Wide Restriction Fragment Analysis

v1.9.6
GPL-2
Authors
Bernd Gruber [aut, cre], Arthur Georges [aut], Jose L. Mijangos [aut], Peter J. Unmack [ctb], Oliver Berry [ctb], Lindsay V. Clark [ctb], Floriaan Devloo-Delva [ctb]
Initial release
2021-04-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.