fastcluster: hclust.vector – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

fastcluster

hclust.vector

Fast hierarchical, agglomerative clustering of vector data

Description

This function implements hierarchical, agglomerative clustering with memory-saving algorithms.

Usage

hclust.vector(X, method="single", members=NULL, metric='euclidean', p=NULL)

Arguments

`X`	an (N×D) matrix of 'double' values: N observations in D variables.
`method`	the agglomeration method to be used. This must be (an unambiguous abbreviation of) one of `"single"`, `"ward"`, `"centroid"` or `"median"`.
`members`	`NULL` or a vector with length the number of observations.
`metric`	the distance measure to be used. This must be one of `"euclidean"`, `"maximum"`, `"manhattan"`, `"canberra"`, `"binary"` or `"minkowski"`. Any unambiguous substring can be given.
`p`	parameter for the Minkowski metric.

Details

The function hclust.vector provides clustering when the input is vector data. It uses memory-saving algorithms which allow processing of larger data sets than hclust does.

The "ward", "centroid" and "median" methods require metric="euclidean" and cluster the data set with respect to Euclidean distances.

For "single" linkage clustering, any dissimilarity measure may be chosen. Currently, the same metrics are implemented as the dist function provides.

The call

hclust.vector(X, method='single', metric=[...])

gives the same result as

hclust(dist(X, metric=[...]), method='single')

but uses less memory and is equally fast.

For the Euclidean methods, care must be taken since hclust expects squared Euclidean distances. Hence, the call

hclust.vector(X, method='centroid')

is, aside from the lesser memory requirements, equivalent to

d = dist(X)
  hc = hclust(d^2, method='centroid')
  hc$height = sqrt(hc$height)

The same applies to the "median" method. The "ward" method in hclust.vector is equivalent to hclust with method "ward.D2", but to method "ward.D" only after squaring as above.

More details are in the User's manual fastcluster.pdf, which is available as a vignette. Get this from the R command line with vignette('fastcluster').

Author(s)

Daniel Müllner

References

http://danifold.net/fastcluster.html

Examples

# Taken and modified from stats::hclust
## Perform centroid clustering with squared Euclidean distances,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust.vector(USArrests, "cen")
# squared Euclidean distances
hc$height <- hc$height^2
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
  cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust.vector(cent, method = "cen", members = table(memb))
# squared Euclidean distances
hc1$height <- hc1$height^2
opar <- par(mfrow = c(1, 2))
plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)

fastcluster

Fast Hierarchical Clustering Routines for R and 'Python'

v1.1.25

FreeBSD | GPL-2 | file LICENSE

Authors

Daniel Müllner [aut, cph, cre], Google Inc. [cph]

Initial release

2018-05-29

hclust.vector

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

fastcluster

We don't support your browser anymore