Baker's Gamma correlation coefficient
Calculate Baker's Gamma correlation coefficient for two trees (also known as Goodman-Kruskal-gamma index).
Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.
WARNING: this can be quite slow for medium/large trees.
cor_bakers_gamma(dend1, ...) ## Default S3 method: cor_bakers_gamma(dend1, dend2, ...) ## S3 method for class 'dendrogram' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'hclust' cor_bakers_gamma( dend1, dend2, use_labels_not_values = TRUE, to_plot = FALSE, warn = dendextend_options("warn"), ... ) ## S3 method for class 'dendlist' cor_bakers_gamma(dend1, which = c(1L, 2L), ...)
dend1 |
a tree (dendrogram/hclust/phylo) |
... |
Passed to cutree. |
dend2 |
a tree (dendrogram/hclust/phylo) |
use_labels_not_values |
logical (TRUE). Should labels be used in the k matrix when using cutree? Set to FALSE will make the function a bit faster BUT, it assumes the two trees have the exact same leaves order values for each labels. This can be assured by using match_order_by_labels. |
to_plot |
logical (FALSE). Passed to bakers_gamma_for_2_k_matrix |
warn |
logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. should a warning be issued when using cutree? |
which |
an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted (relevant for dendlist) |
Baker's Gamma (see reference) is a measure of accosiation (similarity) between two trees of heirarchical clustering (dendrograms).
It is calculated by taking two items, and see what is the heighst possible level of k (number of cluster groups created when cutting the tree) for which the two item still belongs to the same tree. That k is returned, and the same is done for these two items for the second tree. There are n over 2 combinations of such pairs of items from the items in the tree, and all of these numbers are calculated for each of the two trees. Then, these two sets of numbers (a set for the items in each tree) are paired according to the pairs of items compared, and a spearman correlation is calculated.
The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar. For exact p-value one should result to a permutation test. One such option will be to permute over the labels of one tree many times, and calculating the distriubtion under the null hypothesis (keeping the trees topologies constant).
Notice that this measure is not affected by the height of a branch but only of its relative position compared with other branches.
Baker's Gamma association Index between two trees (a number between -1 to 1)
Baker, F. B., Stability of Two Hierarchical Grouping Techniques Case 1: Sensitivity to Data Errors. Journal of the American Statistical Association, 69(346), 440 (1974).
## Not run: set.seed(23235) ss <- sample(1:150, 10) hc1 <- hclust(dist(iris[ss, -5]), "com") hc2 <- hclust(dist(iris[ss, -5]), "single") dend1 <- as.dendrogram(hc1) dend2 <- as.dendrogram(hc2) # cutree(dend1) cor_bakers_gamma(hc1, hc2) cor_bakers_gamma(dend1, dend2) dend1 <- match_order_by_labels(dend1, dend2) # if you are not sure cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE) library(microbenchmark) microbenchmark( with_labels = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE), with_values = cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE, try_cutree_hclust = FALSE ), times = 10 ) cor_bakers_gamma(dend1, dend1, use_labels_not_values = FALSE) cor_bakers_gamma(dend1, dend1, use_labels_not_values = TRUE) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.