Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

kRp.text-class

S4 Class kRp.text


Description

This class is used for objects that are returned by treetag or tokenize.

Slots

lang

A character string, naming the language that is assumed for the tokenized text in this object.

desc

Descriptive statistics of the tagged text.

tokens

Results of the called tokenizer and POS tagger. The data.frame usually has eleven columns:

doc_id:

Factor, optional document identifier.

token:

Character, the tokenized text.

tag:

Factor, POS tags for each token.

lemma:

Character, lemma for each token.

lttr:

Integer, number of letters.

wclass:

Factor, word class.

desc:

Factor, a short description of the POS tag.

stop:

Logical, TRUE if token is a stopword.

stem:

Character, stemmed token.

idx:

Integer, index number of token in this document.

sntc:

Integer, number of sentence in this document.

This data.frame structure adheres to the "Text Interchange Formats" guidelines set out by rOpenSci[1].

features

A named logical vector, indicating which features are available in this object's feat_list slot. Common features are listed in the description of the feat_list slot.

feat_list

A named list with optional analysis results or other content as used by the defined features:

See the getter and setter methods for easy access to these sub-slots. There can actually be any number of additional features, the above is just a list of those already defined by this package.

Contructor function

Should you need to manually generate objects of this class (which should rarely be the case), the contructor function kRp_text(...) can be used instead of new("kRp.text", ...).

Note

There is also as() methods to transform objects from other koRpus classes into kRp.text.

References

[1] Text Interchange Formats (https://github.com/ropensci/tif)


koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6
GPL (>= 3)
Authors
Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.