Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

fastcluster

Fast hierarchical, agglomerative clustering routines for R and Python


Description

The fastcluster package provides efficient algorithms for hierarchical, agglomerative clustering. In addition to the R interface, there is also a Python interface to the underlying C++ library, to be found in the source distribution.

Details

The function hclust provides clustering when the input is a dissimilarity matrix. A dissimilarity matrix can be computed from vector data by dist. The hclust function can be used as a drop-in replacement for existing routines: stats::hclust and flashClust::hclust alias flashClust::flashClust. Once the fastcluster library is loaded at the beginning of the code, every program that uses hierarchical clustering can benefit immediately and effortlessly from the performance gain

When the package is loaded, it overwrites the function hclust with the new code.

The function hclust.vector provides memory-saving routines when the input is vector data.

Further information:

Author(s)

Daniel Müllner

References

See Also

Examples

# Taken and modified from stats::hclust
#
# hclust(...)        # new method
# hclust.vector(...) # new method
# stats::hclust(...) # old method

require(fastcluster)
require(graphics)

hc <- hclust(dist(USArrests), "ave")
plot(hc)
plot(hc, hang = -1)

## Do the same with centroid clustering and squared Euclidean distance,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust.vector(USArrests, "cen")
# squared Euclidean distances
hc$height <- hc$height^2
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
  cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust.vector(cent, method = "cen", members = table(memb))
# squared Euclidean distances
hc1$height <- hc1$height^2
opar <- par(mfrow = c(1, 2))
plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)

fastcluster

Fast Hierarchical Clustering Routines for R and 'Python'

v1.1.25
FreeBSD | GPL-2 | file LICENSE
Authors
Daniel Müllner [aut, cph, cre], Google Inc. [cph]
Initial release
2018-05-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.