Matrix Distance/Similarity Computation
These functions compute and return the auto-distance/similarity matrix between either rows or columns of a matrix/data frame, or a list, as well as the cross-distance matrix between two matrices/data frames/lists.
dist(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE, pairwise = FALSE, by_rows = TRUE, convert_similarities = TRUE, auto_convert_data_frames = TRUE) simil(x, y = NULL, method = NULL, ..., diag = FALSE, upper = FALSE, pairwise = FALSE, by_rows = TRUE, convert_distances = TRUE, auto_convert_data_frames = TRUE) pr_dist2simil(x) pr_simil2dist(x) as.dist(x, FUN = NULL) as.simil(x, FUN = NULL) ## S3 method for class 'dist' as.matrix(x, diag = 0, ...) ## S3 method for class 'simil' as.matrix(x, diag = NA, ...)
x |
For |
y |
|
method |
a function, a registry entry, or a mnemonic string referencing the
proximity measure. A list of all available measures can be obtained
using |
diag |
logical value indicating whether the diagonal of the
distance/similarity matrix should be printed by
In the context of |
upper |
logical value indicating whether the upper triangle of the
distance/similarity matrix should be printed by
|
pairwise |
logical value indicating whether distances should be
computed for the pairs of |
by_rows |
logical indicating whether proximities between rows, or columns should be computed. |
convert_similarities, convert_distances |
logical indicating whether distances should be automatically converted into similarities (and the other way round) if needed. |
auto_convert_data_frames |
logical indicating whether data frames should be converted to matrices if all variables are numeric, or all are logical, or all are complex. |
FUN |
optional function to be used by |
... |
further arguments passed to the proximity function. |
Missing values are allowed but are excluded from all computations
involving the rows within which they occur. If some columns are
excluded in calculating a Euclidean, Manhattan, Canberra or
Minkowski distance, the sum is scaled up proportionally to the
number of columns used (compare dist
in
package stats).
Data frames are silently coerced to matrix if all columns are of
(same) mode numeric
or logical
.
Distance measures can be used with simil
, and similarity
measures with dist
. In these cases, the result is transformed
accordingly using the specified coercion functions (default:
pr\_simil2dist(x) = 1 - abs(x) and pr\_dist2simil(x) = 1 / (1 + x)).
Objects of class simil
and dist
can be converted one in
another using as.dist
and as.simil
, respectively.
Distance and similarity objects can conveniently be subset (see examples). Note that duplicate indexes are silently ignored.
Auto distances/similarities are returned as an object of class dist
/simil
and
cross-distances/similarities as an object of class crossdist
/crosssimil
.
David Meyer David.Meyer@R-project.org and Christian Buchta Christian.Buchta@wu-wien.ac.at
Anderberg, M.R. (1973), Cluster analysis for applications, 359 pp., Academic Press, New York, NY, USA.
Cox, M.F. and Cox, M.A.A. (2001), Multidimensional Scaling, Chapman and Hall.
Sokol, R.S. and Sneath P.H.A (1963), Principles of Numerical Taxonomy, W. H. Freeman and Co., San Francisco.
### show available proximities summary(pr_DB) ### get more information about a particular one pr_DB$get_entry("Jaccard") ### binary data x <- matrix(sample(c(FALSE, TRUE), 8, rep = TRUE), ncol = 2) dist(x, method = "Jaccard") ### for real-valued data dist(x, method = "eJaccard") ### for positive real-valued data dist(x, method = "fJaccard") ### cross distances dist(x, x, method = "Jaccard") ### pairwise (diagonal) dist(x, x, method = "Jaccard", pairwise = TRUE) ### this is the same but less efficient as.matrix(stats::dist(x, method = "binary")) ### numeric data x <- matrix(rnorm(16), ncol = 4) ## test inheritance of names rownames(x) <- LETTERS[1:4] colnames(x) <- letters[1:4] dist(x) dist(x, x) ## custom distance function f <- function(x, y) sum(x * y) dist(x, f) ## working with lists z <- unlist(apply(x, 1, list), recursive = FALSE) (d <- dist(z)) dist(z, z) ## subsetting d[[1:2]] subset(d, c(1,3,4)) d[[c(1,2,2)]] # duplicate index gets ignored ## transformations and self-proximities as.matrix(as.simil(d, function(x) exp(-x)), diag = 1) ## row and column indexes row.dist(d) col.dist(d)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.