Extended HGDP-CEPH dataset
This dataset consists of 1350 individuals from native Human populations distributed worldwide typed at 678 microsatellite loci. The original HGDP-CEPH panel [1-3] has been extended by several native American populations [4]. This dataset was used to illustrate the Discriminant Analysis of Principal Components (DAPC, [5]).
eHGDP
is a genind object with a data frame named
popInfo
as supplementary component (eHGDP@other$popInfo
),
which contains the following variables:
a character vector indicating populations.
a character vector indicating the geographic region of each population.
a
character vector indicating the correspondence with population labels used
in the genind object (i.e., as output by pop(eHGDP)
).
geographic coordinates of the populations, indicated as north and east degrees.
Original panel by Human Genome Diversity Project (HGDP) and Centre d'Etude du Polymorphisme Humain (CEPH). See reference [4] for Native American populations.
This copy of the dataset was prepared by Francois Balloux.
[1] Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, et al. (2002) Genetic structure of human populations. Science 298: 2381-2385.
[2] Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, et al. (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 102: 15942-15947.
[3] Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, et al. (2002) A human genome diversity cell line panel. Science 296: 261-262.
[4] Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, et al. (2007) Genetic Variation and Population Structure in Native Americans. PLoS Genetics 3: e185.
[5] Jombart, T., Devillard, S. and Balloux, F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Submitted to BMC genetics.
## Not run: ## LOAD DATA data(eHGDP) eHGDP ## PERFORM DAPC - USE POPULATIONS AS CLUSTERS ## to reproduce exactly analyses from the paper, use "n.pca=1000" dapc1 <- dapc(eHGDP, all.contrib=TRUE, scale=FALSE, n.pca=200, n.da=80) # takes 2 minutes dapc1 ## (see ?dapc for details about the output) ## SCREEPLOT OF EIGENVALUES barplot(dapc1$eig, main="eHGDP - DAPC eigenvalues", col=c("red","green","blue", rep("grey", 1000))) ## SCATTERPLOTS ## (!) Note: colors may be inverted with respect to [5] ## as signs of principal components are arbitrary ## and change from one computer to another ## ## axes 1-2 s.label(dapc1$grp.coord[,1:2], clab=0, sub="Axes 1-2") par(xpd=T) colorplot(dapc1$grp.coord[,1:2], dapc1$grp.coord, cex=3, add=TRUE) add.scatter.eig(dapc1$eig,10,1,2, posi="bottomright", ratio=.3, csub=1.25) ## axes 2-3 s.label(dapc1$grp.coord[,2:3], clab=0, sub="Axes 2-3") par(xpd=T) colorplot(dapc1$grp.coord[,2:3], dapc1$grp.coord, cex=3, add=TRUE) add.scatter.eig(dapc1$eig,10,1,2, posi="bottomright", ratio=.3, csub=1.25) ## MAP DAPC1 RESULTS if(require(maps)){ xy <- cbind(eHGDP$other$popInfo$Longitude, eHGDP$other$popInfo$Latitude) par(mar=rep(.1,4)) map(fill=TRUE, col="lightgrey") colorplot(xy, -dapc1$grp.coord, cex=3, add=TRUE, trans=FALSE) } ## LOOK FOR OTHER CLUSTERS ## to reproduce results of the reference paper, use : ## grp <- find.clusters(eHGDP, max.n=50, n.pca=200, scale=FALSE) ## and then ## plot(grp$Kstat, type="b", col="blue") grp <- find.clusters(eHGDP, max.n=30, n.pca=200, scale=FALSE, n.clust=4) # takes about 2 minutes names(grp) ## (see ?find.clusters for details about the output) ## PERFORM DAPC - USE POPULATIONS AS CLUSTERS ## to reproduce exactly analyses from the paper, use "n.pca=1000" dapc2 <- dapc(eHGDP, pop=grp$grp, all.contrib=TRUE, scale=FALSE, n.pca=200, n.da=80) # takes around a 1 minute dapc2 ## PRODUCE SCATTERPLOT scatter(dapc2) # axes 1-2 scatter(dapc2,2,3) # axes 2-3 ## MAP DAPC2 RESULTS if(require(maps)){ xy <- cbind(eHGDP$other$popInfo$Longitude, eHGDP$other$popInfo$Latitude) myCoords <- apply(dapc2$ind.coord, 2, tapply, pop(eHGDP), mean) par(mar=rep(.1,4)) map(fill=TRUE, col="lightgrey") colorplot(xy, myCoords, cex=3, add=TRUE, trans=FALSE) } ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.