RWeka: Weka_tokenizers – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

RWeka

Weka_tokenizers

R/Weka Tokenizers

Description

R interfaces to Weka tokenizers.

Usage

AlphabeticTokenizer(x, control = NULL)
NGramTokenizer(x, control = NULL)
WordTokenizer(x, control = NULL)

Arguments

`x`	a character vector with strings to be tokenized.
`control`	an object of class `Weka_control`, or a character vector of control options, or `NULL` (default). Available options can be obtained on-line using the Weka Option Wizard `WOW`, or the Weka documentation.

Details

AlphabeticTokenizer is an alphabetic string tokenizer, where tokens are to be formed only from contiguous alphabetic sequences.

NGramTokenizer splits strings into n-grams with given minimal and maximal numbers of grams.

WordTokenizer is a simple word tokenizer.

Value

A character vector with the tokenized strings.

RWeka

R/Weka Interface

v0.4-43

GPL-2

Authors

Kurt Hornik [aut, cre] (<https://orcid.org/0000-0003-4198-9911>), Christian Buchta [ctb], Torsten Hothorn [ctb], Alexandros Karatzoglou [ctb], David Meyer [ctb], Achim Zeileis [ctb] (<https://orcid.org/0000-0003-0918-3766>)

Initial release

Weka_tokenizers

Description

Usage

Arguments

Details

Value

RWeka

We don't support your browser anymore