Clustering by merging Gaussian mixture components
Clustering by merging Gaussian mixture components; computes all methods introduced in Hennig (2010) from an initial mclust clustering. See details section for details.
mergenormals(xdata, mclustsummary=NULL, clustering, probs, muarray, Sigmaarray, z, method=NULL, cutoff=NULL, by=0.005, numberstop=NULL, renumber=TRUE, M=50, ...) ## S3 method for class 'mergenorm' summary(object, ...) ## S3 method for class 'summary.mergenorm' print(x, ...)
xdata |
data (something that can be coerced into a matrix). |
mclustsummary |
output object from
|
clustering |
vector of integers. Initial assignment of data to mixture components. |
probs |
vector of component proportions (for all components; should sum up to one). |
muarray |
matrix of component means (rows). |
Sigmaarray |
array of component covariance matrices (third dimension refers to component number). |
z |
matrix of observation- (row-)wise posterior probabilities of belonging to the components (columns). |
method |
one of |
cutoff |
numeric between 0 and 1. Tuning constant, see details and Hennig (2010). If not specified, the default values given in (9) in Hennig (2010) are used. |
by |
real between 0 and 1. Interval width for density computation
along the ridgeline, used for methods |
numberstop |
integer. If specified, |
renumber |
logical. If |
M |
integer. Number of times the dataset is divided into two
halves. Used if |
... |
additional optional parameters to pass on to
|
object |
object of class |
x |
object of class |
Mixture components are merged in a hierarchical fashion. The merging
criterion is computed for all pairs of current clusters and the two
clusters with the highest criterion value (lowest, respectively, for
method="predictive"
) are merged. Then criterion values are
recomputed for the merged cluster. Merging is continued until the
criterion value to merge is below (or above, for
method="predictive"
) the cutoff value. Details are given in
Hennig (2010). The following criteria are offered, specified by the
method
-argument.
components are only merged if their mixture is
unimodal according to Ray and Lindsay's (2005) ridgeline theory,
see ridgeline.diagnosis
. This ignores argument
cutoff
.
ratio between density minimum between
components and minimum of density maxima according to Ray and
Lindsay's (2005) ridgeline theory, see
ridgeline.diagnosis
.
Bhattacharyya upper bound on misclassification
probability between two components, see
bhattacharyya.matrix
.
direct estimation of misclassification probability between components, see Hennig (2010).
this uses method="ridge.ratio"
to decide
which clusters to merge but stops merging according to the p-value of
the dip test computed as in Hartigan and Hartigan (1985), see
dip.test
.
as "dipuni"
, but p-value of dip test
computed as in Tantrum, Murua and Stuetzle (2003), see
dipp.tantrum
.
this uses method="demp"
to decide which
clusters to merge but stops merging according to the value of
prediction strength (Tibshirani and Walther, 2005) as computed in
mixpredictive
.
mergenormals
gives out an object of class mergenorm
,
which is a List with components
clustering |
integer vector. Final clustering. |
clusternumbers |
vector of numbers of remaining clusters. These
are given in terms of the original clusters even of
|
defunct.components |
vector of numbers of components that were "merged away". |
valuemerged |
vector of values of the merging criterion (see details) at which components were merged. |
mergedtonumbers |
vector of numbers of clusters to which the original components were merged. |
parameters |
a list, if |
predvalues |
vector of prediction strength values for
clusternumbers from 1 to the number of components in the original
mixture, if |
orig.decisionmatrix |
square matrix with entries giving the original values of the merging criterion (see details) for every pair of original mixture components. |
new.decisionmatrix |
square matrix as |
probs |
final cluster values of |
muarray |
final cluster means, analogous to |
Sigmaarray |
final cluster covariance matrices, analogous to
|
z |
final matrix of posterior probabilities of observations
belonging to the clusters, analogous to |
noise |
logical. If |
method |
as above. |
cutoff |
as above. |
summary.mergenorm
gives out a list with components
clustering, clusternumbers, defunct.components, valuemerged,
mergedtonumbers, predvalues, probs, muarray, Sigmaarray, z, noise,
method, cutoff
as above, plus onc
(original number of
components) and mnc
(number of clusters after merging).
J. A. Hartigan and P. M. Hartigan (1985) The Dip Test of Unimodality, Annals of Statistics, 13, 70-84.
Hennig, C. (2010) Methods for merging Gaussian mixture components, Advances in Data Analysis and Classification, 4, 3-34.
Ray, S. and Lindsay, B. G. (2005) The Topography of Multivariate Normal Mixtures, Annals of Statistics, 33, 2042-2065.
Tantrum, J., Murua, A. and Stuetzle, W. (2003) Assessment and Pruning of Hierarchical Model Based Clustering, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C., 197-205.
Tibshirani, R. and Walther, G. (2005) Cluster Validation by Prediction Strength, Journal of Computational and Graphical Statistics, 14, 511-528.
require(mclust) require(MASS) options(digits=3) data(crabs) dc <- crabs[,4:8] cm <- mclustBIC(crabs[,4:8],G=9,modelNames="EEE") scm <- summary(cm,crabs[,4:8]) cmnbhat <- mergenormals(crabs[,4:8],scm,method="bhat") summary(cmnbhat) cmndemp <- mergenormals(crabs[,4:8],scm,method="demp") summary(cmndemp) # Other methods take a bit longer, but try them! # The values of by and M below are still chosen for reasonably fast execution. # cmnrr <- mergenormals(crabs[,4:8],scm,method="ridge.ratio",by=0.05) # cmd <- mergenormals(crabs[,4:8],scm,method="dip.tantrum",by=0.05) # cmp <- mergenormals(crabs[,4:8],scm,method="predictive",M=3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.