R/Weka Tokenizers
R interfaces to Weka tokenizers.
AlphabeticTokenizer(x, control = NULL) NGramTokenizer(x, control = NULL) WordTokenizer(x, control = NULL)
x |
a character vector with strings to be tokenized. |
control |
an object of class |
AlphabeticTokenizer
is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer
splits strings into n-grams with given
minimal and maximal numbers of grams.
WordTokenizer
is a simple word tokenizer.
A character vector with the tokenized strings.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.