Standardise cluster validation statistics by random clustering results
Standardises cluster validity statistics as produced by
clustatsum
relative to results that were achieved by
random clusterings on the same data by
randomclustersim
. The aim is to make differences between
values comparable between indexes, see Hennig (2019), Akhanli and
Hennig (2020).
This is mainly for use within clusterbenchstats
.
cgrestandard(clusum,clusim,G,percentage=FALSE, useallmethods=FALSE, useallg=FALSE, othernc=list())
clusum |
object of class "valstat", see |
clusim |
list; output object of |
G |
vector of integers. Numbers of clusters to consider. |
percentage |
logical. If |
useallmethods |
logical. If |
useallg |
logical. If |
othernc |
list of integer vectors of length 2. This allows the
incorporation of methods that bring forth other numbers of clusters
than those in |
cgrestandard
will add a statistic named dmode
to the
input set of validation statistics, which is defined as
0.75*dindex+0.25*highdgap
, aggregating these two closely
related statistics, see clustatsum
.
List of class "valstat"
, see
valstat.object
, with standardised results as
explained above.
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
set.seed(20000) options(digits=3) face <- rFace(10,dMoNo=2,dNoEy=0,p=2) dif <- dist(face) clusum <- list() clusum[[2]] <- list() cl12 <- kmeansCBI(face,2) cl13 <- kmeansCBI(face,3) cl22 <- claraCBI(face,2) cl23 <- claraCBI(face,2) ccl12 <- clustatsum(dif,cl12$partition) ccl13 <- clustatsum(dif,cl13$partition) ccl22 <- clustatsum(dif,cl22$partition) ccl23 <- clustatsum(dif,cl23$partition) clusum[[1]] <- list() clusum[[1]][[2]] <- ccl12 clusum[[1]][[3]] <- ccl13 clusum[[2]][[2]] <- ccl22 clusum[[2]][[3]] <- ccl23 clusum$maxG <- 3 clusum$minG <- 2 clusum$method <- c("kmeansCBI","claraCBI") clusum$name <- c("kmeansCBI","claraCBI") clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1, fnruns=1,avenruns=1,monitor=FALSE) cgr <- cgrestandard(clusum,clusim,2:3) cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE) cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE) print(str(cgr)) print(str(cgr2)) print(cgr3[[1]][[2]])
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.