RcppCWB: cl_lexicon_size – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

cl_lexicon_size

Get Lexicon Size.

Description

Get the total number of unique tokens/ids of a positional attribute. Note that token ids are zero-based, i.e. when iterating through tokens, start at 0, the maximum will be cl_lexicon_size() minus 1.

Usage

cl_lexicon_size(corpus, p_attribute,
  registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

`corpus`	name of a CWB corpus (upper case)
`p_attribute`	name of positional attribute
`registry`	path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY

Examples

registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry()
Sys.setenv(CORPUS_REGISTRY = registry)
lexicon_size <- cl_lexicon_size("REUTERS", p_attribute = "word")
token_ids <- seq.int(from = 0, to = lexicon_size - 1)
cl_id2str("REUTERS", p_attribute = "word", id = token_ids)

RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

v0.3.2

GPL-3

Authors

Andreas Blaette [aut, cre], Bernard Desgraupes [aut], Sylvain Loiseau [aut], Oliver Christ [ctb], Bruno Maximilian Schulze [ctb], Stefan Evert [ctb], Arne Fitschen [ctb], Jeroen Ooms [ctb], Marius Bertram [ctb]

Initial release

2021-02-03