Partitioning around medoids with estimation of number of clusters
This calls the function pam
or
clara
to perform a
partitioning around medoids clustering with the number of clusters
estimated by optimum average silhouette width (see
pam.object
) or Calinski-Harabasz
index (calinhara
). The Duda-Hart test
(dudahart2
) is applied to decide whether there should be
more than one cluster (unless 1 is excluded as number of clusters or
data are dissimilarities).
pamk(data,krange=2:10,criterion="asw", usepam=TRUE, scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"), critout=FALSE, ns=10, seed=NULL, ...)
data |
a data matrix or data frame or something that can be
coerced into a matrix, or dissimilarity matrix or
object. See |
krange |
integer vector. Numbers of clusters which are to be
compared by the average silhouette width criterion. Note: average
silhouette width and Calinski-Harabasz can't estimate number of
clusters |
criterion |
one of |
usepam |
logical. If |
scaling |
either a logical value or a numeric vector of length
equal to the number of variables. If |
alpha |
numeric between 0 and 1, tuning constant for
|
diss |
logical flag: if |
critout |
logical. If |
ns |
passed on to |
seed |
passed on to |
... |
A list with components
pamobject |
The output of the optimal run of the
|
nc |
the optimal number of clusters. |
crit |
vector of criterion values for numbers of
clusters. |
Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1-27.
Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.
Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
options(digits=3) set.seed(20000) face <- rFace(50,dMoNo=2,dNoEy=0,p=2) pk1 <- pamk(face,krange=1:5,criterion="asw",critout=TRUE) pk2 <- pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE) # "multiasw" is better for larger data sets, use larger ns then. pk3 <- pamk(face,krange=1:5,criterion="ch",critout=TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.