Filter loci or specimens in a genlight {adegenet} object based on call rate
SNP datasets generated by DArT have missing values primarily arising from failure to call a SNP because of a mutation at one or both of the the restriction enzyme recognition sites. This script reports the number of missing values for each of several percentiles. The script gl.filter.callrate() will filter out the loci with call rates below a specified threshold.
gl.filter.callrate( x, method = "loc", threshold = 0.95, mono.rm = FALSE, recalc = FALSE, recursive = FALSE, plot = TRUE, bins = 25, verbose = NULL )
x |
name of the genlight object containing the SNP data, or the genind object containing the SilocoDArT data [required] |
method |
– "loc" to specify that loci are to be filtered, "ind" to specify that specimens are to be filtered, "pop" to remove loci that fail to meet the specified threshold in any one population [default "loc"] |
threshold |
– threshold value below which loci will be removed [default 0.95] |
mono.rm |
– Remove monomorphic loci after analysis is complete [default FALSE] |
recalc |
– Recalculate the locus metadata statistics if any individuals are deleted in the filtering [default FALSE] |
recursive |
– Repeatedly filter individuals on call rate, each time removing monomorphic loci. Only applies if method="ind" and mono.rm=TRUE [default FALSE] |
plot |
specify if histograms of call rate, before and after, are to be produced [default TRUE] |
bins |
– number of bins to display in histograms [default 25] |
verbose |
– verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity] |
Tag Presence/Absence datasets (SilicoDArT) have missing values where it is not possible to determine reliably if there the sequence tag can be called at a particular locus.
method = 'ind': Because this filter operates on call rate, this function recalculates Call Rate, if necessary, before filtering. If individuals are removed using method='ind', then the call rate stored in the genlight object is, optionally, recalcuated after filtering.
recursive=TRUE: Note that when filtering individuals on call rate, the initial call rate is calculated and compared against the threshold. After filtering, if mono.rm=TRUE, the removal of monomorphic loci will alter the call rates. Some individuals with a call rate initially greater than the nominated threshold, and so retained, may come to have a call rate lower than the threshold. If this is a problem, repeated iterations of this function will resolve the issue. This is done by setting mono.rm=TRUE and recursive=TRUE, or it can be done manually.
The reduced genlight or genind object, plus a summary
Arthur Georges and Bernd Gruber (Post to https://groups.google.com/d/forum/dartr)
# SNP data result <- gl.filter.callrate(testset.gl, method="loc", threshold=0.95, verbose=3) result <- gl.filter.callrate(testset.gl, method="ind", threshold=0.8, verbose=3) # Tag P/A data result <- gl.filter.callrate(testset.gs, method="loc", threshold=0.95, verbose=3) result <- gl.filter.callrate(testset.gs, method="ind", threshold=0.8, verbose=3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.