Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

h2o.tokenize

Tokenize String


Description

h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).

Usage

h2o.tokenize(x, split)

Arguments

x

The column or columns whose strings to tokenize.

split

The regular expression to split on.

Value

An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA.

Examples

## Not run: 
library(h2o)
h2o.init()
string_to_tokenize <- as.h2o("Split at every character and tokenize.")
tokenize_string <- h2o.tokenize(as.character(string_to_tokenize), "")

## End(Not run)

h2o

R Interface for the 'H2O' Scalable Machine Learning Platform

v3.32.1.2
Apache License (== 2.0)
Authors
Erin LeDell [aut, cre], Navdeep Gill [aut], Spencer Aiello [aut], Anqi Fu [aut], Arno Candel [aut], Cliff Click [aut], Tom Kraljevic [aut], Tomas Nykodym [aut], Patrick Aboyoun [aut], Michal Kurka [aut], Michal Malohlava [aut], Ludi Rehak [ctb], Eric Eckstrand [ctb], Brandon Hill [ctb], Sebastian Vidrio [ctb], Surekha Jadhawani [ctb], Amy Wang [ctb], Raymond Peck [ctb], Wendy Wong [ctb], Jan Gorecki [ctb], Matt Dowle [ctb], Yuan Tang [ctb], Lauren DiPerna [ctb], Tomas Fryda [ctb], H2O.ai [cph, fnd]
Initial release
2021-04-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.