tclust: tkmeans – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

tkmeans

Trimmed k-means Cluster Analysis

Description

tkmeans searches for k (or less) spherical clusters in a data matrix x, whereas the ceiling (alpha n) most outlying observations are trimmed.

Usage

tkmeans (x, k = 3, alpha = 0.05, nstart = 50, iter.max = 20, 
         equal.weights = FALSE, center = 0, scale = 1, store.x = TRUE,
         drop.empty.clust = TRUE, trace = 0, warnings = 2, zero.tol = 1e-16)

Arguments

`x`	A matrix or data.frame of dimension `n` x `p`, containing the observations (row-wise).
`k`	The number of clusters initially searched for.
`alpha`	The proportion of observations to be trimmed.
`nstart`	The number of random initializations to be performed.
`iter.max`	The maximum number of concentration steps to be performed. The concentration steps are stopped, whenever two consecutive steps lead to the same data partition.
`equal.weights`	A logical value, specifying whether equal cluster weights (`TRUE`) or not (`FALSE`) shall be considered in the concentration and assignment steps.
`center, scale`	A center and scale vector, each of length `p` which can optionally be specified for centering and scaling `x` before calculation
`store.x`	A logical value, specifying whether the data matrix `x` shall be included in the result structure. By default this value is set to `TRUE`, because functions `plot.tkmeans` depends on this information. However, when big data matrices are handled, the result structure's size can be decreased noticeably when setting this parameter to `FALSE`.
`drop.empty.clust`	Logical value specifying, whether empty clusters shall be omitted in the resulting object. (The result structure does not contain center and covariance estimates of empty clusters anymore. Cluster names are reassigned such that the first `l` clusters (`l <= k`) always have at least one observation.
`trace`	Defines the tracing level, which is set to `0` by default. Tracing level `2` gives additional information on the iteratively decreasing objective function's value.
`warnings`	The warning level (0: no warnings; 1: warnings on unexpected behavior.
`zero.tol`	The zero tolerance used. By default set to 1e-16.

Value

The function returns an S3 object of type tkmeans, containing the following values:

`centers`	A matrix of size `p` x `k` containing the centers (column-wise) of each cluster.
`cluster`	A numerical vector of size `n` containing the cluster assignment for each observation. Cluster names are integer numbers from `1` to `k`, `0` indicates trimmed observations.
`par`	A list, containing the parameters the algorithm has been called with (`x`, if not suppressed by `store.x = FALSE`, `k`, `alpha`, `restr.fact`, `nstart`, `KStep`, and `equal.weights`).
`k`	The (final) resulting number of clusters. Some solutions with a smaller number of clusters might be found when using the option `equal.weights = FALSE`.
`obj`	The value of the objective function of the best (returned) solution.
`size`	An integer vector of size k, returning the number of observations contained by each cluster.
`weights`	A numerical vector of length k, containing the weights of each cluster.
`int`	A list of values internally used by function related to `tkmeans` objects.

Author(s)

Agustin Mayo Iscar, Luis Angel Garcia Escudero, Heinrich Fritz

References

Cuesta-Albertos, J. A.; Gordaliza, A. and Matrán, C. (1997), "Trimmed k-means: an attempt to robustify quantizers". Annals of Statistics, Vol. 25 (2), 553-576.

Examples

#--- EXAMPLE 1 ------------------------------------------
sig <- diag (2)
cen <- rep (1,2)
x <- rbind(mvtnorm::rmvnorm(360, cen * 0,   sig),
            mvtnorm::rmvnorm(540, cen * 5,   sig * 6 - 2),
            mvtnorm::rmvnorm(100, cen * 2.5, sig * 50)
            )

# Two groups and 10% trimming level
clus <- tkmeans (x, k = 2, alpha = 0.1)

plot (clus)
plot (clus, labels = "observation")
plot (clus, labels = "cluster")

#--- EXAMPLE 2 ------------------------------------------
data (geyser2)
clus <- tkmeans (geyser2, k = 3, alpha = 0.03)
plot (clus)

#--- EXAMPLE 3 ------------------------------------------
data (swissbank)
# Two clusters and 8% trimming level
clus <- tkmeans (swissbank, k = 2, alpha = 0.08)

                            # Pairs plot of the clustering solution
pairs (swissbank, col = clus$cluster + 1)
                                  # Two coordinates
plot (swissbank[, 4], swissbank[, 6], col = clus$cluster + 1,
     xlab = "Distance of the inner frame to lower border",
     ylab = "Length of the diagonal")
plot (clus)

# Three clusters and 0% trimming level
clus <- tkmeans (swissbank, k = 3, alpha = 0.0)

                            # Pairs plot of the clustering solution
pairs (swissbank, col = clus$cluster + 1)

                                   # Two coordinates
plot (swissbank[, 4], swissbank[, 6], col = clus$cluster + 1, 
      xlab = "Distance of the inner frame to lower border", 
      ylab = "Length of the diagonal")

plot (clus)

tclust

Robust Trimmed Clustering

v1.4-2

GPL-3

Authors