Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cl_lexicon_size

Get Lexicon Size.


Description

Get the total number of unique tokens/ids of a positional attribute. Note that token ids are zero-based, i.e. when iterating through tokens, start at 0, the maximum will be cl_lexicon_size() minus 1.

Usage

cl_lexicon_size(corpus, p_attribute,
  registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

corpus

name of a CWB corpus (upper case)

p_attribute

name of positional attribute

registry

path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY

Examples

registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry()
Sys.setenv(CORPUS_REGISTRY = registry)
lexicon_size <- cl_lexicon_size("REUTERS", p_attribute = "word")
token_ids <- seq.int(from = 0, to = lexicon_size - 1)
cl_id2str("REUTERS", p_attribute = "word", id = token_ids)

RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

v0.3.2
GPL-3
Authors
Andreas Blaette [aut, cre], Bernard Desgraupes [aut], Sylvain Loiseau [aut], Oliver Christ [ctb], Bruno Maximilian Schulze [ctb], Stefan Evert [ctb], Arne Fitschen [ctb], Jeroen Ooms [ctb], Marius Bertram [ctb]
Initial release
2021-02-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.