Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

get_count_vector

Get Vector with Counts for Positional Attribute.


Description

The return value is an integer vector. The length of the vector is the number of unique tokens in the corpus / the number of unique ids. The order of the counts corresponds to the number of ids.

Usage

get_count_vector(corpus, p_attribute,
  registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

corpus

a CWB corpus

p_attribute

a positional attribute

registry

registry directory

Value

an integer vector

Examples

registry <- use_tmp_registry()
y <- get_count_vector(
  corpus = "REUTERS", p_attribute = "word",
  registry = registry
  )
df <- data.frame(token_id = 0:(length(y) - 1), count = y)
df[["token"]] <- cl_id2str(
  "REUTERS", p_attribute = "word",
  id = df[["token_id"]], registry = registry
  )
df <- df[,c("token", "token_id", "count")] # reorder columns
df <- df[order(df[["count"]], decreasing = TRUE),]
head(df)

RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

v0.3.2
GPL-3
Authors
Andreas Blaette [aut, cre], Bernard Desgraupes [aut], Sylvain Loiseau [aut], Oliver Christ [ctb], Bruno Maximilian Schulze [ctb], Stefan Evert [ctb], Arne Fitschen [ctb], Jeroen Ooms [ctb], Marius Bertram [ctb]
Initial release
2021-02-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.