kernlab: specc – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

specc

Spectral Clustering

Description

A spectral clustering algorithm. Clustering is performed by embedding the data into the subspace of the eigenvectors of an affinity matrix.

Usage

## S4 method for signature 'formula'
specc(x, data = NULL, na.action = na.omit, ...)

## S4 method for signature 'matrix'
specc(x, centers,
      kernel = "rbfdot", kpar = "automatic", 
      nystrom.red = FALSE, nystrom.sample = dim(x)[1]/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

## S4 method for signature 'kernelMatrix'
specc(x, centers, nystrom.red = FALSE, iterations = 200, ...)

## S4 method for signature 'list'
specc(x, centers,
      kernel = "stringdot", kpar = list(length=4, lambda=0.5),
      nystrom.red = FALSE, nystrom.sample = length(x)/6,
      iterations = 200, mod.sample = 0.75, na.action = na.omit, ...)

Arguments

`x`	the matrix of data to be clustered, or a symbolic description of the model to be fit, or a kernel Matrix of class `kernelMatrix`, or a list of character vectors.
`data`	an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘specc’ is called from.
`centers`	Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in the eigenvectors matrix are chosen as the initial centers.
`kernel`	the kernel function used in computing the affinity matrix. This parameter can be set to any function, of class kernel, which computes a dot product between two vector arguments. kernlab provides the most popular kernel functions which can be used by setting the kernel parameter to the following strings: `rbfdot` Radial Basis kernel function "Gaussian" `polydot` Polynomial kernel function `vanilladot` Linear kernel function `tanhdot` Hyperbolic tangent kernel function `laplacedot` Laplacian kernel function `besseldot` Bessel kernel function `anovadot` ANOVA RBF kernel function `splinedot` Spline kernel `stringdot` String kernel The kernel parameter can also be set to a user defined function of class kernel by passing the function name as an argument.
`kpar`	a character string or the list of hyper-parameters (kernel parameters). The default character string `"automatic"` uses a heuristic to determine a suitable value for the width parameter of the RBF kernel. The second option `"local"` (local scaling) uses a more advanced heuristic and sets a width parameter for every point in the data set. This is particularly useful when the data incorporates multiple scales. A list can also be used containing the parameters to be used with the kernel function. Valid parameters for existing kernels are : `sigma` inverse kernel width for the Radial Basis kernel function "rbfdot" and the Laplacian kernel "laplacedot". `degree, scale, offset` for the Polynomial kernel "polydot" `scale, offset` for the Hyperbolic tangent kernel function "tanhdot" `sigma, order, degree` for the Bessel kernel "besseldot". `sigma, degree` for the ANOVA kernel "anovadot". `length, lambda, normalized` for the "stringdot" kernel where length is the length of the strings considered, lambda the decay factor and normalized a logical parameter determining if the kernel evaluations should be normalized. Hyper-parameters for user defined kernels can be passed through the kpar parameter as well.
`nystrom.red`	use nystrom method to calculate eigenvectors. When `TRUE` a sample of the dataset is used to calculate the eigenvalues, thus only a n x m matrix where n the sample size is stored in memory (default: `FALSE`
`nystrom.sample`	number of data points to use for estimating the eigenvalues when using the nystrom method. (default : dim(x)[1]/6)
`mod.sample`	proportion of data to use when estimating sigma (default: 0.75)
`iterations`	the maximum number of iterations allowed.
`na.action`	the action to perform on NA
`...`	additional parameters

Details

Spectral clustering works by embedding the data points of the partitioning problem into the subspace of the k largest eigenvectors of a normalized affinity/kernel matrix. Using a simple clustering method like kmeans on the embedded points usually leads to good performance. It can be shown that spectral clustering methods boil down to graph partitioning.
The data can be passed to the specc function in a matrix or a data.frame, in addition specc also supports input in the form of a kernel matrix of class kernelMatrix or as a list of character vectors where a string kernel has to be used.

Value

An S4 object of class specc which extends the class vector containing integers indicating the cluster to which each point is allocated. The following slots contain useful information

`centers`	A matrix of cluster centers.
`size`	The number of point in each cluster
`withinss`	The within-cluster sum of squares for each cluster
`kernelf`	The kernel function used

Author(s)

Alexandros Karatzoglou
alexandros.karatzoglou@ci.tuwien.ac.at

References

Andrew Y. Ng, Michael I. Jordan, Yair Weiss
On Spectral Clustering: Analysis and an Algorithm
Neural Information Processing Symposium 2001
http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf

Examples

## Cluster the spirals data set.
data(spirals)

sc <- specc(spirals, centers=2)

sc
centers(sc)
size(sc)
withinss(sc)

plot(spirals, col=sc)

kernlab

Kernel-Based Machine Learning Lab

v0.9-29

GPL-2

Authors

Alexandros Karatzoglou [aut, cre], Alex Smola [aut], Kurt Hornik [aut], National ICT Australia (NICTA) [cph], Michael A. Maniscalco [ctb, cph], Choon Hui Teo [ctb]

Initial release