Fast hierarchical, agglomerative clustering routines for R and Python
The fastcluster package provides efficient algorithms for hierarchical, agglomerative clustering. In addition to the R interface, there is also a Python interface to the underlying C++ library, to be found in the source distribution.
The function hclust
provides clustering when the
input is a dissimilarity matrix. A dissimilarity matrix can be
computed from vector data by dist
. The
hclust
function can be used as a drop-in replacement for
existing routines: stats::hclust
and
flashClust::hclust
alias
flashClust::flashClust
. Once the
fastcluster library is loaded at the beginning of the code, every
program that uses hierarchical clustering can benefit immediately and
effortlessly from the performance gain
When the package is loaded, it overwrites the function
hclust
with the new code.
The function hclust.vector
provides memory-saving routines
when the input is vector data.
Further information:
R documentation pages: hclust
,
hclust.vector
A comprehensive User's manual:
fastcluster.pdf. Get this from the R
command line with vignette('fastcluster')
.
JSS paper: https://www.jstatsoft.org/v53/i09/.
See the author's home page for a performance comparison: http://danifold.net/fastcluster.html.
Daniel Müllner
# Taken and modified from stats::hclust # # hclust(...) # new method # hclust.vector(...) # new method # stats::hclust(...) # old method require(fastcluster) require(graphics) hc <- hclust(dist(USArrests), "ave") plot(hc) plot(hc, hang = -1) ## Do the same with centroid clustering and squared Euclidean distance, ## cut the tree into ten clusters and reconstruct the upper part of the ## tree from the cluster centers. hc <- hclust.vector(USArrests, "cen") # squared Euclidean distances hc$height <- hc$height^2 memb <- cutree(hc, k = 10) cent <- NULL for(k in 1:10){ cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE])) } hc1 <- hclust.vector(cent, method = "cen", members = table(memb)) # squared Euclidean distances hc1$height <- hc1$height^2 opar <- par(mfrow = c(1, 2)) plot(hc, labels = FALSE, hang = -1, main = "Original Tree") plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters") par(opar)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.