Tokenize String
h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).
h2o.tokenize(x, split)
x |
The column or columns whose strings to tokenize. |
split |
The regular expression to split on. |
An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA.
## Not run: library(h2o) h2o.init() string_to_tokenize <- as.h2o("Split at every character and tokenize.") tokenize_string <- h2o.tokenize(as.character(string_to_tokenize), "") ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.