Converts a text to a sequence of indexes in a fixed-size hashing space.
Converts a text to a sequence of indexes in a fixed-size hashing space.
text_hashing_trick( text, n, hash_function = NULL, filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE, split = " " )
text |
Input text (string). |
n |
Dimension of the hashing space. |
hash_function |
if |
filters |
Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines. |
lower |
Whether to convert the input to lowercase. |
split |
Sentence split marker (string). |
Two or more words may be assigned to the same index, due to possible collisions by the hashing function.
A list of integer word indices (unicity non-guaranteed).
Other text preprocessing:
make_sampling_table()
,
pad_sequences()
,
skipgrams()
,
text_one_hot()
,
text_to_word_sequence()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.