Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

Weka_tokenizers

R/Weka Tokenizers


Description

R interfaces to Weka tokenizers.

Usage

AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)

Arguments

x

a character vector with strings to be tokenized.

control

an object of class Weka_control, or a character vector of control options, or NULL (default). Available options can be obtained on-line using the Weka Option Wizard WOW, or the Weka documentation.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.

Value

A character vector with the tokenized strings.


RWeka

R/Weka Interface

v0.4-43
GPL-2
Authors
Kurt Hornik [aut, cre] (<https://orcid.org/0000-0003-4198-9911>), Christian Buchta [ctb], Torsten Hothorn [ctb], Alexandros Karatzoglou [ctb], David Meyer [ctb], Achim Zeileis [ctb] (<https://orcid.org/0000-0003-0918-3766>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.