Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

snapclust

Maximum-likelihood genetic clustering using EM algorithm


Description

Do not use. We work on that stuff. Contact us if interested.

Usage

snapclust(
  x,
  k,
  pop.ini = "ward",
  max.iter = 100,
  n.start = 10,
  n.start.kmeans = 50,
  hybrids = FALSE,
  dim.ini = 100,
  hybrid.coef = NULL,
  parent.lab = c("A", "B"),
  ...
)

Arguments

x

a genind object

k

the number of clusters to look for

pop.ini

parameter indicating how the initial group membership should be found. If NULL, groups are chosen at random, and the algorithm will be run n.start times. If "kmeans", then the function find.clusters is used to define initial groups using the K-means algorithm. If "ward", then the function find.clusters is used to define initial groups using the Ward algorithm. Alternatively, a factor defining the initial cluster configuration can be provided.

max.iter

the maximum number of iteration of the EM algorithm

n.start

the number of times the EM algorithm is run, each time with different random starting conditions

n.start.kmeans

the number of times the K-means algorithm is run to define the starting point of the ML-EM algorithm, each time with different random starting conditions

hybrids

a logical indicating if hybrids should be modelled explicitely; this is currently implemented for 2 groups only.

dim.ini

the number of PCA axes to retain in the dimension reduction step for find.clusters, if this method is used to define initial group memberships (see argument pop.ini).

hybrid.coef

a vector of hybridization coefficients, defining the proportion of hybrid gene pool coming from the first parental population; this is symmetrized around 0.5, so that e.g. c(0.25, 0.5) will be converted to c(0.25, 0.5, 0.75)

parent.lab

a vector of 2 character strings used to label the two parental populations; only used if hybrids are detected (see argument hybrids)

...

further arguments passed on to find.clusters

Value

The function snapclust returns a list with the following components:

  • $group a factor indicating the maximum-likelihood assignment of individuals to groups; if identified, hybrids are labelled after hybridization coefficients, e.g. 0.5_A - 0.5_B for F1, 0.75_A - 0.25_B for backcross F1 / A, etc.

  • $ll: the log-likelihood of the model

  • $proba: a matrix of group membership probabilities, with individuals in rows and groups in columns; each value correspond to the probability that a given individual genotype was generated under a given group, under Hardy-Weinberg hypotheses.

  • $converged a logical indicating if the algorithm converged; if FALSE, it is doubtful that the result is an actual Maximum Likelihood estimate.

  • $n.iter an integer indicating the number of iterations the EM algorithm was run for.

Author(s)

Thibaut Jombart thibautjombart@gmail.com and Marie-Pauline Beugin

Examples

## Not run: 
data(microbov)

## try function using k-means initialization
grp.ini <- find.clusters(microbov, n.clust=15, n.pca=150)

## run EM algo
res <- snapclust(microbov, 15, pop.ini = grp.ini$grp)
names(res)
res$converged
res$n.iter

## plot result
compoplot(res)

## flag potential hybrids
to.flag <- apply(res$proba,1,max)<.9
compoplot(res, subset=to.flag, show.lab=TRUE,
                 posi="bottomleft", bg="white")


## Simulate hybrids F1
zebu <- microbov[pop="Zebu"]
salers <- microbov[pop="Salers"]
hyb <- hybridize(zebu, salers, n=30)
x <- repool(zebu, salers, hyb)

## method without hybrids
res.no.hyb <- snapclust(x, k=2, hybrids=FALSE)
compoplot(res.no.hyb, col.pal=spectral, n.col=2)

## method with hybrids
res.hyb <- snapclust(x, k=2, hybrids=TRUE)
compoplot(res.hyb, col.pal =
          hybridpal(col.pal = spectral), n.col = 2)


## Simulate hybrids backcross (F1 / parental)
f1.zebu <- hybridize(hyb, zebu, 20, pop = "f1.zebu")
f1.salers <- hybridize(hyb, salers, 25, pop = "f1.salers")
y <- repool(x, f1.zebu, f1.salers)

## method without hybrids
res2.no.hyb <- snapclust(y, k = 2, hybrids = FALSE)
compoplot(res2.no.hyb, col.pal = hybridpal(), n.col = 2)

## method with hybrids F1 only
res2.hyb <- snapclust(y, k = 2, hybrids = TRUE)
compoplot(res2.hyb, col.pal = hybridpal(), n.col = 2)

## method with back-cross
res2.back <- snapclust(y, k = 2, hybrids = TRUE, hybrid.coef = c(.25,.5))
 compoplot(res2.hyb, col.pal = hybridpal(), n.col = 2)


## End(Not run)

adegenet

Exploratory Analysis of Genetic and Genomic Data

v2.1.3
GPL (>= 2)
Authors
Thibaut Jombart [aut] (<https://orcid.org/0000-0003-2226-8692>), Zhian N. Kamvar [aut, cre] (<https://orcid.org/0000-0003-1458-7108>), Caitlin Collins [ctb], Roman Lustrik [ctb], Marie-Pauline Beugin [ctb], Brian J. Knaus [ctb], Peter Solymos [ctb], Vladimir Mikryukov [ctb], Klaus Schliep [ctb], Tiago Maié [ctb], Libor Morkovsky [ctb], Ismail Ahmed [ctb], Anne Cori [ctb], Federico Calboli [ctb], RJ Ewing [ctb], Frédéric Michaud [ctb], Rebecca DeCamp [ctb], Alexandre Courtiol [ctb] (<https://orcid.org/0000-0003-0637-2959>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.