Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

FM_index_R

Calculating Fowlkes-Mallows index in R


Description

Calculating Fowlkes-Mallows index.

The FM_index_R function also calculates the expectancy and variance of the FM Index under the null hypothesis of no relation.

Usage

FM_index_R(
  A1_clusters,
  A2_clusters,
  assume_sorted_vectors = FALSE,
  warn = dendextend_options("warn"),
  ...
)

Arguments

A1_clusters

a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A1. These are often obtained by using some k cut on a dendrogram.

A2_clusters

a numeric vector of cluster grouping (numeric) of items, with a name attribute of item name for each element from group A2. These are often obtained by using some k cut on a dendrogram.

assume_sorted_vectors

logical (FALSE). Can we assume to two group vectors are sorter so that they have the same order of items? IF FALSE (default), then the vectors will be sorted based on their name attribute.

warn

logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE.

...

Ignored.

Details

From Wikipedia:

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

Value

The Fowlkes-Mallows index between two vectors of clustering groups.

Includes the attributes E_FM and V_FM for the relevant expectancy and variance under the null hypothesis of no-relation.

References

Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.

See Also

Examples

## Not run: 

set.seed(23235)
ss <- TRUE # sample(1:150, 10 )
hc1 <- hclust(dist(iris[ss, -5]), "com")
hc2 <- hclust(dist(iris[ss, -5]), "single")
# dend1 <- as.dendrogram(hc1)
# dend2 <- as.dendrogram(hc2)
#    cutree(dend1)

FM_index_R(cutree(hc1, k = 3), cutree(hc1, k = 3)) # 1
set.seed(1341)
FM_index_R(cutree(hc1, k = 3),
           sample(cutree(hc1, k = 3)), 
           assume_sorted_vectors = TRUE) # 0.38037
FM_index_R(cutree(hc1, k = 3), 
           sample(cutree(hc1, k = 3)), 
           assume_sorted_vectors = FALSE) # 1 again :)
FM_index_R(cutree(hc1, k = 3), 
           cutree(hc2, k = 3)) # 0.8059
FM_index_R(cutree(hc1, k = 30), 
           cutree(hc2, k = 30)) # 0.4529

fo <- function(k) FM_index_R(cutree(hc1, k), cutree(hc2, k))
lapply(1:4, fo)
ks <- 1:150
plot(sapply(ks, fo) ~ ks, type = "b", main = "Bk plot for the iris dataset")

clu_1 <- cutree(hc2, k = 100) # this is a lie - since this one is NOT well defined!
clu_2 <- cutree(as.dendrogram(hc2), k = 100) # We see that we get a vector of NAs for this...

FM_index_R(clu_1, clu_2) # NA

## End(Not run)

dendextend

Extending 'dendrogram' Functionality in R

v1.15.1
GPL-2 | GPL-3
Authors
Tal Galili [aut, cre, cph] (https://www.r-statistics.com), Yoav Benjamini [ths], Gavin Simpson [ctb], Gregory Jefferis [aut, ctb] (imported code from his dendroextras package), Marco Gallotta [ctb] (a.k.a: marcog), Johan Renaudie [ctb] (https://github.com/plannapus), The R Core Team [ctb] (Thanks for the Infastructure, and code in the examples), Kurt Hornik [ctb], Uwe Ligges [ctb], Andrej-Nikolai Spiess [ctb], Steve Horvath [ctb], Peter Langfelder [ctb], skullkey [ctb], Mark Van Der Loo [ctb] (https://github.com/markvanderloo d3dendrogram), Andrie de Vries [ctb] (ggdendro author), Zuguang Gu [ctb] (circlize author), Cath [ctb] (https://github.com/CathG), John Ma [ctb] (https://github.com/JohnMCMa), Krzysiek G [ctb] (https://github.com/storaged), Manuela Hummel [ctb] (https://github.com/hummelma), Chase Clark [ctb] (https://github.com/chasemc), Lucas Graybuck [ctb] (https://github.com/hypercompetent), jdetribol [ctb] (https://github.com/jdetribol), Ben Ho [ctb] (https://github.com/SplitInf), Samuel Perreault [ctb] (https://github.com/samperochkin), Christian Hennig [ctb] (http://www.homepages.ucl.ac.uk/~ucakche/), David Bradley [ctb] (https://github.com/DBradley27), Houyun Huang [ctb] (https://github.com/houyunhuang)
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.