Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cor_cophenetic

Cophenetic correlation between two trees


Description

Cophenetic correlation coefficient for two trees.

Assumes the labels in the two trees fully match. If they do not please first use intersect_trees to have them matched.

Usage

cor_cophenetic(dend1, ...)

## Default S3 method:
cor_cophenetic(
  dend1,
  dend2,
  method_coef = c("pearson", "kendall", "spearman"),
  ...
)

## S3 method for class 'dendlist'
cor_cophenetic(
  dend1,
  which = c(1L, 2L),
  method_coef = c("pearson", "kendall", "spearman"),
  ...
)

Arguments

dend1

a tree (dendrogram/hclust/phylo, or dendlist)

...

Ignored.

dend2

Either a tree (dendrogram/hclust/phylo), or a dist object (for example, from the original data matrix).

method_coef

a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman", can be abbreviated. Passed to cor.

which

an integer vector of length 2, indicating which of the trees in a dendlist object should have their cor_cophenetic calculated.

Details

From cophenetic: The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster. Note that this distance has many ties and restrictions.

cor_cophenetic calculates the correlation between two cophenetic distance matrices of the two trees.

The value can range between -1 to 1. With near 0 values meaning that the two trees are not statistically similar. For exact p-value one should result to a permutation test. One such option will be to permute over the labels of one tree many times, and calculating the distriubtion under the null hypothesis (keeping the trees topologies constant).

Notice that this measure IS affected by the height of a branch.

Value

The correlation between cophenetic

References

Sokal, R. R. and F. J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11:33-40

Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principles and Practice of Numerical Classification, p. 278 ff; Freeman, San Francisco.

See Also

Examples

## Not run: 

set.seed(23235)
ss <- sample(1:150, 10)
hc1 <- iris[ss, -5] %>%
  dist() %>%
  hclust("com")
hc2 <- iris[ss, -5] %>%
  dist() %>%
  hclust("single")
dend1 <- as.dendrogram(hc1)
dend2 <- as.dendrogram(hc2)
#    cutree(dend1)

cophenetic(hc1)
cophenetic(hc2)
# notice how the dist matrix for the dendrograms have different orders:
cophenetic(dend1)
cophenetic(dend2)

cor(cophenetic(hc1), cophenetic(hc2)) # 0.874
cor(cophenetic(dend1), cophenetic(dend2)) # 0.16
# the difference is becasue the order of the distance table in the case of
# stats:::cophenetic.dendrogram will change between dendrograms!

# however, this is consistant (since I force-sort the rows/columns):
cor_cophenetic(hc1, hc2)
cor_cophenetic(dend1, dend2)

cor_cophenetic(dendlist(dend1, dend2))

# we can also use different cor methods (almost the same result though):
cor_cophenetic(hc1, hc2, method = "spearman") # 0.8456014
cor_cophenetic(dend1, dend2, method = "spearman") #


# cophenetic correlation is about 10 times (!) faster than bakers_gamma cor:
library(microbenchmark)
microbenchmark(
  cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = FALSE),
  cor_cophenetic = cor_cophenetic(dend1, dend2),
  times = 10
)

# but only because of the cutree for dendrogram. When allowing hclust cutree
# it is only about twice as fast:
microbenchmark(
  cor_bakers_gamma = cor_bakers_gamma(dend1, dend2, try_cutree_hclust = TRUE),
  cor_cophenetic = cor_cophenetic(dend1, dend2),
  times = 10
)

## End(Not run)

dendextend

Extending 'dendrogram' Functionality in R

v1.15.1
GPL-2 | GPL-3
Authors
Tal Galili [aut, cre, cph] (https://www.r-statistics.com), Yoav Benjamini [ths], Gavin Simpson [ctb], Gregory Jefferis [aut, ctb] (imported code from his dendroextras package), Marco Gallotta [ctb] (a.k.a: marcog), Johan Renaudie [ctb] (https://github.com/plannapus), The R Core Team [ctb] (Thanks for the Infastructure, and code in the examples), Kurt Hornik [ctb], Uwe Ligges [ctb], Andrej-Nikolai Spiess [ctb], Steve Horvath [ctb], Peter Langfelder [ctb], skullkey [ctb], Mark Van Der Loo [ctb] (https://github.com/markvanderloo d3dendrogram), Andrie de Vries [ctb] (ggdendro author), Zuguang Gu [ctb] (circlize author), Cath [ctb] (https://github.com/CathG), John Ma [ctb] (https://github.com/JohnMCMa), Krzysiek G [ctb] (https://github.com/storaged), Manuela Hummel [ctb] (https://github.com/hummelma), Chase Clark [ctb] (https://github.com/chasemc), Lucas Graybuck [ctb] (https://github.com/hypercompetent), jdetribol [ctb] (https://github.com/jdetribol), Ben Ho [ctb] (https://github.com/SplitInf), Samuel Perreault [ctb] (https://github.com/samperochkin), Christian Hennig [ctb] (http://www.homepages.ucl.ac.uk/~ucakche/), David Bradley [ctb] (https://github.com/DBradley27), Houyun Huang [ctb] (https://github.com/houyunhuang)
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.