Relaxed Value Matching
closest( x, table, tolerance = Inf, ppm = 0, duplicates = c("keep", "closest", "remove"), nomatch = NA_integer_, .check = TRUE ) common( x, table, tolerance = Inf, ppm = 0, duplicates = c("keep", "closest", "remove"), .check = TRUE ) join( x, y, tolerance = 0, ppm = 0, type = c("outer", "left", "right", "inner"), .check = TRUE, ... )
x |
|
table |
|
tolerance |
|
ppm |
|
duplicates |
|
nomatch |
|
.check |
|
y |
|
type |
|
... |
ignored. |
It is not guaranteed that there is a one-to-one matching for neither the
x
to table
nor the table
to x
matching.
If multiple elements in x
match a single element in table
all their
corresponding indices are returned if duplicates="keep"
is set (default).
This behaviour is identical to match()
. For duplicates="closest"
just
the closest element in x
gets the corresponding index in table
and
for duplicates="remove"
all elements in x
that match to the same element
in table
are set to nomatch
.
If a single element in x
matches multiple elements in table
the closest
is returned for duplicates="keep"
or duplicates="closest"
(keeping
multiple matches isn't possible in this case because the return value should
be of the same length as x
). If the differences between x
and the
corresponding matches in table
are identical the lower index (the smaller
element in table
) is returned. There is one exception: if the lower index
is already returned for another x
with a smaller difference to this
index
the higher one is returned for duplicates = "closer"
(but only if there is no other x
that is closer to the higher one).
For duplicates="remove"
all multiple matches are returned as nomatch
as
above.
.checks = TRUE
tests among other input validation checks for increasingly
sorted x
and table
arguments that are mandatory assumptions for the
closest
algorithm. These checks require to loop through both vectors and
compare each element against its precursor.
Depending on the length and distribution of x
and table
these checks take
equal/more time than the whole closest
algorithm. If it is ensured by other
methods that both arguments x
and table
are sorted the tests could be
skipped by .check = FALSE
. In the case that .check = FALSE
is used
and one of x
and table
is not sorted (or decreasingly sorted)
the output would be incorrect in the best case and result in infinity
loop in the average and worst case.
join
: joins two numeric
vectors by mapping values in x
with
values in y
and vice versa if they are similar enough (provided the
tolerance
and ppm
specified). The function returns a matrix
with the
indices of mapped values in x
and y
. Parameter type
allows to define
how the vectors will be joined: type = "left"
: values in x
will be
mapped to values in y
, elements in y
not matching any value in x
will
be discarded. type = "right"
: same as type = "left"
but for y
.
type = "outer"
: return matches for all values in x
and in y
.
type = "inner"
: report only indices of values that could be mapped.
closest
returns an integer
vector of the same length as x
giving the closest position in table
of the first match or nomatch
if
there is no match.
common
returns a logical
vector of length x
that is TRUE
if the
element in x
was found in table
. It is similar to %in%
.
join
returns a matrix
with two columns, namely x
and y
,
representing the index of the values in x
matching the corresponding value
in y
(or NA
if the value does not match).
join
is based on closest(x, y, tolerance, duplicates = "closest")
.
That means for multiple matches just the closest one is reported.
Sebastian Gibb, Johannes Rainer
Other grouping/matching functions:
bin()
## Define two vectors to match x <- c(1, 3, 5) y <- 1:10 ## Compare match and closest match(x, y) closest(x, y) ## If there is no exact match x <- x + 0.1 match(x, y) # no match closest(x, y) ## Some new values x <- c(1.11, 45.02, 556.45) y <- c(3.01, 34.12, 45.021, 46.1, 556.449) ## Using a single tolerance value closest(x, y, tolerance = 0.01) ## Using a value-specific tolerance accepting differences of 20 ppm closest(x, y, ppm = 20) ## Same using 50 ppm closest(x, y, ppm = 50) ## Sometimes multiple elements in `x` match to `table` x <- c(1.6, 1.75, 1.8) y <- 1:2 closest(x, y, tolerance = 0.5) closest(x, y, tolerance = 0.5, duplicates = "closest") closest(x, y, tolerance = 0.5, duplicates = "remove") ## Are there any common values? x <- c(1.6, 1.75, 1.8) y <- 1:2 common(x, y, tolerance = 0.5) common(x, y, tolerance = 0.5, duplicates = "closest") common(x, y, tolerance = 0.5, duplicates = "remove") ## Join two vectors x <- c(1, 2, 3, 6) y <- c(3, 4, 5, 6, 7) jo <- join(x, y, type = "outer") jo x[jo$x] y[jo$y] jl <- join(x, y, type = "left") jl x[jl$x] y[jl$y] jr <- join(x, y, type = "right") jr x[jr$x] y[jr$y] ji <- join(x, y, type = "inner") ji x[ji$x] y[ji$y]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.