Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ft_lsh_utils

Utility functions for LSH models


Description

Utility functions for LSH models

Usage

ml_approx_nearest_neighbors(
  model,
  dataset,
  key,
  num_nearest_neighbors,
  dist_col = "distCol"
)

ml_approx_similarity_join(
  model,
  dataset_a,
  dataset_b,
  threshold,
  dist_col = "distCol"
)

Arguments

model

A fitted LSH model, returned by either ft_minhash_lsh() or ft_bucketed_random_projection_lsh().

dataset

The dataset to search for nearest neighbors of the key.

key

Feature vector representing the item to search for.

num_nearest_neighbors

The maximum number of nearest neighbors.

dist_col

Output column for storing the distance between each result row and the key.

dataset_a

One of the datasets to join.

dataset_b

Another dataset to join.

threshold

The threshold for the distance of row pairs.


sparklyr

R Interface to Apache Spark

v1.6.2
Apache License 2.0 | file LICENSE
Authors
Javier Luraschi [aut], Kevin Kuo [aut] (<https://orcid.org/0000-0001-7803-7901>), Kevin Ushey [aut], JJ Allaire [aut], Samuel Macedo [ctb], Hossein Falaki [aut], Lu Wang [aut], Andy Zhang [aut], Yitao Li [aut, cre] (<https://orcid.org/0000-0002-1261-905X>), Jozef Hajnala [ctb], Maciej Szymkiewicz [ctb] (<https://orcid.org/0000-0003-1469-9396>), Wil Davis [ctb], RStudio [cph], The Apache Software Foundation [aut, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.