optpart: optsil – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

optsil

Clustering by Optimizing Silhouette Widths

Description

Silhouette width is a measurement of the mean similarity of each object to the other objects in its cluster, compared to its mean similarity to the most similar cluster (see silhouette). Optsil is an iterative re-allocation algorithm to maximize the mean silhouette width of a clustering for a given number of clusters.

Usage

optsil(x,dist,maxitr)

Arguments

`x`	an integer, a vector of integers, an object of class ‘clustering’, ‘partana’, ‘partition’, or ‘stride’
`dist`	a object of class ‘dist’ from `dist`, `dsvdis`, or `vegdist`
`maxitr`	the maximum number of iterations to perform

Details

optsil produces a partition, or clustering, of items into clusters by iterative reallocation of items to clusters so as to maximize the mean silhouette width of the classification. At each iteration optsil ranks all possible re-allocations of a item from one cluster to another. The reallocation that maximizes the change in the mean silhouette width is performed. Because silhouette widths are not independent of clusters that are not modified, only a single reallocation can be preformed in a single iteration. When no further re-allocations result in an improvement, or the maximum number of iterations is achieved, the algorithm stops.

Optsil is an unweighted algorithm, i.e. each of the objects is included in the calculation exactly once.

Optsil can be extremely slow to converge, and is best used to ‘polish’ an existing partition or clusterings resulting from slicing an hclust or from functions optpart, pam, diana or other initial clusterings. It is possible to run optsil from a random start, but is EXTREMELY SLOW to converge, and should be done only with caution.

Value

a list with elements:

`clustering`	a vector of integers giving the cluster assignment for each object
`sils`	a vector of the silhouette widths achieved at each iteration
`numitr`	the number of iterations performed

Author(s)

David W. Roberts droberts@montana.edu

Examples

data(shoshveg)
dis.bc <- dsvdis(shoshveg,'bray/curtis')
opt.5 <- optpart(5,dis.bc)
sil.5 <- optsil(opt.5,dis.bc,100) # make take a few minutes
summary(silhouette(sil.5,dis.bc))
## Not run: plot(silhouette(sil.5,dis.bc))

optpart

Optimal Partitioning of Similarity Relations

v3.0-3

GPL (>= 2)

Authors

David W. Roberts <droberts@montana.edu>

Initial release