Fit flexdog to multiple SNPs.
This is a convenience function that will run flexdog
over many SNPs.
Support is provided for parallel computing through the doParallel package.
This function has not been extensively tested. Please report any bugs to
http://github.com/dcgerard/updog/issues.
multidog( refmat, sizemat, ploidy, model = c("norm", "hw", "bb", "s1", "s1pp", "f1", "f1pp", "flex", "uniform", "custom"), nc = 1, p1_id = NULL, p2_id = NULL, bias_init = exp(c(-1, -0.5, 0, 0.5, 1)), prior_vec = NULL, ... )
refmat |
A matrix of reference read counts. The columns index
the individuals and the rows index the markers (SNPs). This matrix must have
rownames (for the names of the markers) and column names (for the names
of the individuals). These names must match the names in |
sizemat |
A matrix of total read counts. The columns index
the individuals and the rows index the markers (SNPs). This matrix must have
rownames (for the names of the markers) and column names (for the names
of the individuals). These names must match the names in |
ploidy |
The ploidy of the species. Assumed to be the same for each individual. |
model |
What form should the prior (genotype distribution) take? See Details for possible values. |
nc |
The number of computing cores to use. This should never be
more than the number of cores available in your computing environment.
You can determine the maximum number of available cores by running
|
p1_id |
The ID of the first parent. This should be a character of
length 1. This should correspond to a single column name in |
p2_id |
The ID of the second parent. This should be a character of
length 1. This should correspond to a single column name in |
bias_init |
A vector of initial values for the bias parameter
over the multiple runs of |
prior_vec |
The pre-specified genotype distribution. Only used if
|
... |
Additional parameters to pass to |
You should format your reference counts and total read counts in two separate matrices. The rows should index the markers (SNPs) and the columns should index the individuals. Row names are how we ID the SNPs and column names are how we ID the individuals, and so they are required attributes.
If your data are in VCF files, I would recommend importing them using the VariantAnnotation package from Bioconductor https://bioconductor.org/packages/VariantAnnotation/. It's a great VCF parser.
See the details of flexdog
for the possible values of
model
.
If model = "f1"
, model = "s1"
, model = "f1pp"
or model = "s1pp"
then the user may
provide the individual ID for parent(s) via the p1_id
and p2_id
arguments.
The output is a list containing two data frames. The first data frame,
called snpdf
, contains information on each SNP, such as the allele bias
and the sequencing error rate. The second data frame, called inddf
,
contains information on each individual at each SNP, such as the estimated
genotype and the posterior probability of being classified correctly.
Using an nc
value greater than 1
will allow you to
run flexdog
in parallel. Only set nc
greater than
1
if you are sure you have access to the proper number of cores.
The upper bound on the value of nc
you should try can be determined
by running parallel::detectCores()
in R. Most admins of high
performance computing environments place limits on the number of cores
you can use at any one time. So if you are using multidog()
on
a supercomputer, do not use parallel::detectCores()
and discuss
with your admin how you can safely run parallel jobs.
SNPs that contain 0 reads (or all missing data) are entirely removed.
A list-like object of two data frames.
snpdf
A data frame containing properties of the SNPs (markers). The rows index the SNPs. The variables include:
snp
The name of the SNP (marker).
bias
The estimated allele bias of the SNP.
seq
The estimated sequencing error rate of the SNP.
od
The estimated overdispersion parameter of the SNP.
prop_mis
The estimated proportion of individuals misclassified in the SNP.
num_iter
The number of iterations performed during the EM algorithm for that SNP.
llike
The maximum marginal likelihood of the SNP.
ploidy
The provided ploidy of the species.
model
The provided model for the prior genotype distribution.
p1ref
The user-provided reference read counts of parent 1.
p1size
The user-provided total read counts of parent 1.
p2ref
The user-provided reference read counts of parent 2.
p2size
The user-provided total read counts of parent 2.
Pr_k
The estimated frequency of individuals with genotype k, where k can be any integer between 0 and the ploidy level.
See the return value of
par
in the help page of flexdog
.
inddf
A data frame containing the properties of the individuals at each SNP. The variables include:
snp
The name of the SNP (marker).
ind
The name of the individual.
ref
The provided reference counts for that individual at that SNP.
size
The provided total counts for that individual at that SNP.
geno
The posterior mode genotype for that individual at that SNP. This is the estimated reference allele dosage for a given individual at a given SNP.
postmean
The posterior mean genotype for that individual at that SNP. This is a continuous genotype estimate of the reference allele dosage for a given individual at a given SNP.
maxpostprob
The maximum posterior probability. This is the posterior probability that the individual was genotyped correctly.
Pr_k
The posterior probability that a given individual at a given SNP has genotype k, where k can vary from 0 to the ploidy level of the species.
logL_k
The genotype log-likelihoods for dosage k for a given individual at a given SNP, where k can vary f rom 0 to the ploidy level of the species.
David Gerard
flexdog()
:For the underlying genotyping function.
format_multidog()
:For converting the output
of multidog()
to a matrix.
filter_snp()
:For filtering SNPs using the
output of multidog()
.
## Not run: data("uitdewilligen") mout <- multidog(refmat = t(uitdewilligen$refmat), sizemat = t(uitdewilligen$sizemat), ploidy = uitdewilligen$ploidy, nc = 2) mout$inddf mout$snpdf ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.