Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cor_bakers_gamma

Baker's Gamma correlation coefficient


Description

Calculate Baker's Gamma correlation coefficient for two trees (also known as Goodman-Kruskal-gamma index).

Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.

WARNING: this can be quite slow for medium/large trees.

Usage

cor_bakers_gamma(dend1, ...)

## Default S3 method:
cor_bakers_gamma(dend1, dend2, ...)

## S3 method for class 'dendrogram'
cor_bakers_gamma(
  dend1,
  dend2,
  use_labels_not_values = TRUE,
  to_plot = FALSE,
  warn = dendextend_options("warn"),
  ...
)

## S3 method for class 'hclust'
cor_bakers_gamma(
  dend1,
  dend2,
  use_labels_not_values = TRUE,
  to_plot = FALSE,
  warn = dendextend_options("warn"),
  ...
)

## S3 method for class 'dendlist'
cor_bakers_gamma(dend1, which = c(1L, 2L), ...)

Arguments

dend1

a tree (dendrogram/hclust/phylo)

...

Passed to cutree.

dend2

a tree (dendrogram/hclust/phylo)

use_labels_not_values

logical (TRUE). Should labels be used in the k matrix when using cutree? Set to FALSE will make the function a bit faster BUT, it assumes the two trees have the exact same leaves order values for each labels. This can be assured by using match_order_by_labels.

to_plot

logical (FALSE). Passed to bakers_gamma_for_2_k_matrix

warn

logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. should a warning be issued when using cutree?

which

an integer vector of length 2, indicating which of the trees in the dendlist object should be plotted (relevant for dendlist)

Details

Baker's Gamma (see reference) is a measure of accosiation (similarity) between two trees of heirarchical clustering (dendrograms).

It is calculated by taking two items, and see what is the heighst possible level of k (number of cluster groups created when cutting the tree) for which the two item still belongs to the same tree. That k is returned, and the same is done for these two items for the second tree. There are n over 2 combinations of such pairs of items from the items in the tree, and all of these numbers are calculated for each of the two trees. Then, these two sets of numbers (a set for the items in each tree) are paired according to the pairs of items compared, and a spearman correlation is calculated.

The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar. For exact p-value one should result to a permutation test. One such option will be to permute over the labels of one tree many times, and calculating the distriubtion under the null hypothesis (keeping the trees topologies constant).

Notice that this measure is not affected by the height of a branch but only of its relative position compared with other branches.

Value

Baker's Gamma association Index between two trees (a number between -1 to 1)

References

Baker, F. B., Stability of Two Hierarchical Grouping Techniques Case 1: Sensitivity to Data Errors. Journal of the American Statistical Association, 69(346), 440 (1974).

See Also

Examples

## Not run: 

set.seed(23235)
ss <- sample(1:150, 10)
hc1 <- hclust(dist(iris[ss, -5]), "com")
hc2 <- hclust(dist(iris[ss, -5]), "single")
dend1 <- as.dendrogram(hc1)
dend2 <- as.dendrogram(hc2)
#    cutree(dend1)

cor_bakers_gamma(hc1, hc2)
cor_bakers_gamma(dend1, dend2)

dend1 <- match_order_by_labels(dend1, dend2) # if you are not sure
cor_bakers_gamma(dend1, dend2, use_labels_not_values = FALSE)

library(microbenchmark)
microbenchmark(
  with_labels = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE),
  with_values = cor_bakers_gamma(dend1, dend2,
    use_labels_not_values = FALSE, try_cutree_hclust = FALSE
  ),
  times = 10
)


cor_bakers_gamma(dend1, dend1, use_labels_not_values = FALSE)
cor_bakers_gamma(dend1, dend1, use_labels_not_values = TRUE)

## End(Not run)

dendextend

Extending 'dendrogram' Functionality in R

v1.15.1
GPL-2 | GPL-3
Authors
Tal Galili [aut, cre, cph] (https://www.r-statistics.com), Yoav Benjamini [ths], Gavin Simpson [ctb], Gregory Jefferis [aut, ctb] (imported code from his dendroextras package), Marco Gallotta [ctb] (a.k.a: marcog), Johan Renaudie [ctb] (https://github.com/plannapus), The R Core Team [ctb] (Thanks for the Infastructure, and code in the examples), Kurt Hornik [ctb], Uwe Ligges [ctb], Andrej-Nikolai Spiess [ctb], Steve Horvath [ctb], Peter Langfelder [ctb], skullkey [ctb], Mark Van Der Loo [ctb] (https://github.com/markvanderloo d3dendrogram), Andrie de Vries [ctb] (ggdendro author), Zuguang Gu [ctb] (circlize author), Cath [ctb] (https://github.com/CathG), John Ma [ctb] (https://github.com/JohnMCMa), Krzysiek G [ctb] (https://github.com/storaged), Manuela Hummel [ctb] (https://github.com/hummelma), Chase Clark [ctb] (https://github.com/chasemc), Lucas Graybuck [ctb] (https://github.com/hypercompetent), jdetribol [ctb] (https://github.com/jdetribol), Ben Ho [ctb] (https://github.com/SplitInf), Samuel Perreault [ctb] (https://github.com/samperochkin), Christian Hennig [ctb] (http://www.homepages.ucl.ac.uk/~ucakche/), David Bradley [ctb] (https://github.com/DBradley27), Houyun Huang [ctb] (https://github.com/houyunhuang)
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.