Distance based validity criteria for large data sets
Approximates average silhouette width or the Pearson version of Hubert's gamma criterion by hacking the dataset into pieces and averaging the subset-wise values, see Hennig and Liao (2013).
distcritmulti(x,clustering,part=NULL,ns=10,criterion="asw", fun="dist",metric="euclidean", count=FALSE,seed=NULL,...)
x |
cases times variables data matrix. |
clustering |
vector of integers indicating the clustering. |
part |
vector of integer subset sizes; sum should be smaller or
equal to the number of cases of |
ns |
integer. Number of subsets, only used if |
criterion |
|
fun |
|
metric |
passed on to |
count |
logical. if |
seed |
integer, random seed. (If |
... |
A list with components crit.overall,crit.sub,crit.sd,part
.
crit.overall |
value of criterion. |
crit.sub |
vector of subset-wise criterion values. |
crit.sd |
standard deviation of |
subsets |
list of case indexes in subsets. |
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17, 107-145.
Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
set.seed(20000) options(digits=3) face <- rFace(50,dMoNo=2,dNoEy=0,p=2) clustering <- as.integer(attr(face,"grouping")) distcritmulti(face,clustering,ns=3,seed=100000,criterion="pearsongamma")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.