keras: text_hashing_trick – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

text_hashing_trick

Converts a text to a sequence of indexes in a fixed-size hashing space.

Converts a text to a sequence of indexes in a fixed-size hashing space.

text_hashing_trick(
  text,
  n,
  hash_function = NULL,
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n",
  lower = TRUE,
  split = " "
)

`text`	Input text (string).
`n`	Dimension of the hashing space.
`hash_function`	if `NULL` uses python `hash` function, can be 'md5' or any function that takes in input a string and returns a int. Note that `hash` is not a stable hashing function, so it is not consistent across different runs, while 'md5' is a stable hashing function.
`filters`	Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines.
`lower`	Whether to convert the input to lowercase.
`split`	Sentence split marker (string).

Two or more words may be assigned to the same index, due to possible collisions by the hashing function.

A list of integer word indices (unicity non-guaranteed).