Cluster validation statistics - object
The objects of class "valstat"
store cluster validation
statistics from various clustering methods run with various numbers of
clusters.
A legitimate valstat
object is a list. The format of the list
relies on the number of involved clustering methods, nmethods
,
say, i.e., the length
of the method
-component explained below. The first
nmethods
elements of the valstat
-list are just
numbered. These are themselves lists that are numbered between 1 and
the maxG
-component defined below. Element [[i]][[j]]
refers to the clustering from clustering method number i with number
of clusters j. Every such element is a list
with components
avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep,
asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy
:
Further optional components are pamc, kdnorm, kdunif,
dmode, aggregated
. All these are cluster validation indexes, as
follows.
avewithin |
average distance within clusters (reweighted so that every observation, rather than every distance, has the same weight). |
mnnd |
average distance to |
cvnnd |
coefficient of variation of dissimilarities to
|
maxdiameter |
maximum cluster diameter. |
widestgap |
widest within-cluster gap or average of cluster-wise
widest within-cluster gap, depending on parameter |
sindex |
separation index. Defined based on the distances for
every point to the
closest point not in the same cluster. The separation index is then
the mean of the smallest proportion |
minsep |
minimum cluster separation. |
asw |
average silhouette
width. See |
dindex |
this index measures to what extent the density decreases from the cluster mode to the outskirts; I-densdec in Sec. 3.6 of Hennig (2019); low values are good. |
denscut |
this index measures whether cluster boundaries run through density valleys; I-densbound in Sec. 3.6 of Hennig (2019); low values are good. |
highdgap |
this measures whether there is a large within-cluster gap with high density on both sides; I-highdgap in Sec. 3.6 of Hennig (2019); low values are good. |
pearsongamma |
correlation between distances and a 0-1-vector where 0 means same cluster, 1 means different clusters. "Normalized gamma" in Halkidi et al. (2001). |
withinss |
a generalisation of the within clusters sum
of squares (k-means objective function), which is obtained if
|
entropy |
entropy of the distribution of cluster memberships, see Meila(2007). |
pamc |
average distance to cluster centroid, which is the observation that minimises this average distance. |
kdnorm |
Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea). |
kdunif |
Kolmogorov distance between distribution of distances to
|
dmode |
aggregated density mode index equal to
|
Furthermore, a valstat
object
has the following list components:
maxG |
maximum number of clusters. |
minG |
minimum number of clusters (list entries below that number are empty lists). |
method |
vector of names (character strings) of clustering
CBI-functions, see |
name |
vector of names (character strings) of clustering
methods. These can be user-chosen names (see argument
|
statistics |
vector of names (character strings) of cluster validation indexes. |
These objects are generated as part of the
clusterbenchstats
-output.
The valstat
class has methods for the following generic functions:
print
, plot
, see plot.valstat
.
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.