Gower's distance
Compute Gower's distance, pairwise between records in two data sets x
and y
. Records from the smallest data set are recycled over.
gower_dist( x, y, pair_x = NULL, pair_y = NULL, eps = 1e-08, weights = NULL, ignore_case = FALSE, nthread = getOption("gd_num_thread") )
x |
|
y |
|
pair_x |
|
pair_y |
|
eps |
|
weights |
|
ignore_case |
|
nthread |
Number of threads to use for parallelization. By default,
for a dual-core machine, 2 threads are used. For any other machine
n-1 cores are used so your machine doesn't freeze during a big computation.
The maximum nr of threads are determined using |
A numeric
vector of length max(nrow(x),nrow(y))
.
When there are no columns to compare, a message is printed and both
numeric(0)
is returned invisibly.
There are three ways to specify which columns of x
should be compared
with what columns of y
. The first option is do give no specification.
In that case columns with matching names will be used. The second option
is to use only the pairs_y
argument, specifying for each column in x
in order, which column in y
must be used to pair it with (use 0
to skip a column in x
). The third option is to explicitly specify the
columns to be matched using pair_x
and pair_y
.
Gower (1971) originally defined a similarity measure (s, say) with values ranging from 0 (completely dissimilar) to 1 (completely similar). The distance returned here equals 1-s.
Gower, John C. "A general coefficient of similarity and some of its properties." Biometrics (1971): 857-871.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.