Fit Dissimilarities to a Hierarchy
Find the ultrametric from a target equivalence class of hierarchies which minimizes weighted Euclidean or Manhattan dissimilarity to a given dissimilarity object.
ls_fit_ultrametric_target(x, y, weights = 1) l1_fit_ultrametric_target(x, y, weights = 1)
x |
a dissimilarity object inheriting from class
|
y |
a target hierarchy. |
weights |
a numeric vector or matrix with non-negative weights
for obtaining a weighted fit. If a matrix, its numbers of rows and
columns must be the same as the number of objects in |
The target equivalence class consists of all dendrograms for which the
corresponding n-trees are the same as the one corresponding to
y
. I.e., all splits are the same as for y
, and
optimization is over the height of the splits.
The criterion function to be optimized over all ultrametrics from the equivalence class is ∑ w_{ij} |x_{ij} - u_{ij}|^p, where p = 2 in the Euclidean and p = 1 in the Manhattan case, respectively.
The optimum can be computed as follows. Suppose split s joins object classes A and B. As the ultrametric dissimilarities of all objects in A to all objects in B must be the same value, say, u_{A,B} = u_s, the contribution from the split to the criterion function is of the form f_s(u_s) = ∑_{i \in A, j \in B} w_{ij} |x_{ij} - u_s|^p. We need to minimize ∑_s f_s(u_s) under the constraint that the u_s form a non-decreasing sequence, which is accomplished by using the Pool Adjacent Violator Algorithm (PAVA) using the weighted mean (p = 2) or weighted median (p = 1) for solving the blockwise optimization problems.
An object of class "cl_ultrametric"
containing the
optimal ultrametric distances.
ls_fit_ultrametric
for finding the ultrametric
minimizing Euclidean dissimilarity (without fixing the splits).
data("Phonemes") ## Note that the Phonemes data set has the consonant misclassification ## probabilities, i.e., the similarities between the phonemes. d <- as.dist(1 - Phonemes) ## Find the maximal dominated and miminal dominating ultrametrics by ## hclust() with single and complete linkage: y1 <- hclust(d, "single") y2 <- hclust(d, "complete") ## Note that these are quite different: cl_dissimilarity(y1, y2, "gamma") ## Now find the L2 optimal members of the respective dendrogram ## equivalence classes. u1 <- ls_fit_ultrametric_target(d, y1) u2 <- ls_fit_ultrametric_target(d, y2) ## Compute the L2 optimal ultrametric approximation to d. u <- ls_fit_ultrametric(d) ## And compare ... cl_dissimilarity(cl_ensemble(Opt = u, Single = u1, Complete = u2), d) ## The solution obtained via complete linkage is quite close: cl_agreement(u2, u, "cophenetic")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.