Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cl_validity

Validity Measures for Partitions and Hierarchies


Description

Compute validity measures for partitions and hierarchies, attempting to measure how well these clusterings capture the underlying structure in the data they were obtained from.

Usage

cl_validity(x, ...)
## Default S3 method:
cl_validity(x, d, ...)

Arguments

x

an object representing a partition or hierarchy.

d

a dissimilarity object from which x was obtained.

...

arguments to be passed to or from methods.

Details

cl_validity is a generic function.

For partitions, its default method gives the “dissimilarity accounted for”, defined as 1 - a_w / a_t, where a_t is the average total dissimilarity, and the “average within dissimilarity” a_w is given by

∑_{i,j} ∑_k m_{ik}m_{jk} d_{ij} / ∑_{i,j} ∑_k m_{ik}m_{jk}

where d and m are the dissimilarities and memberships, respectively, and the sums are over all pairs of objects and all classes.

For hierarchies, the validity measures computed by default are “variance accounted for” (VAF, e.g., Hubert, Arabie & Meulman, 2006) and “deviance accounted for” (DEV, e.g., Smith, 2001). If u is the ultrametric corresponding to the hierarchy x and d the dissimilarity x was obtained from, these validity measures are given by

max(0, 1 - sum_{i,j} (d_{ij} - u_{ij})^2 / sum_{i,j} (d_{ij} - mean(d))^2)

and

max(0, 1 - sum_{i,j} |d_{ij} - u_{ij}| / sum_{i,j} |d_{ij} - median(d)|)

respectively. Note that VAF and DEV are not invariant under rescaling u, and may be “arbitrarily small” (i.e., 0 using the above definitions) even though u and d are “structurally close” in some sense.

For the results of using agnes and diana, the agglomerative and divisive coefficients are provided in addition to the default ones.

Value

A list of class "cl_validity" with the computed validity measures.

References

L. Hubert, P. Arabie and J. Meulman (2006). The structural representation of proximity matrices with MATLAB. Philadelphia, PA: SIAM.

T. J. Smith (2001). Constructing ultrametric and additive trees based on the L_1 norm. Journal of Classification, 18/2, 185–207. https://link.springer.com/article/10.1007/s00357-001-0015-0.

See Also

cluster.stats in package fpc for a variety of cluster validation statistics; fclustIndex in package e1071 for several fuzzy cluster indexes; clustIndex in package cclust; silhouette in package cluster.


clue

Cluster Ensembles

v0.3-59
GPL-2
Authors
Kurt Hornik [aut, cre] (<https://orcid.org/0000-0003-4198-9911>), Walter Böhm [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.