Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

s_attribute_decode

Decode Structural Attribute.


Description

Get data.frame with left and right corpus positions (cpos) for structural attributes and values.

Usage

s_attribute_decode(corpus, data_dir, s_attribute, encoding = NULL,
  registry = Sys.getenv("CORPUS_REGISTRY"), method = c("R", "Rcpp"))

Arguments

corpus

a CWB corpus

data_dir

data directory where binary files for corpus are stored

s_attribute

a structural attribute

encoding

encoding of the values ("latin-1" or "utf-8")

registry

registry directory

method

character vector, whether to use "R" or "Rcpp" implementation

Details

Two approaches are implemented: A pure R solution will decode the files directly in the directory specified by data_dir. An implementation using Rcpp will use the registry file for corpus to find the data directory.

Value

A data.frame with three columns. Column cpos_left are the start corpus positions of a structural annotation, cpos_right the end corpus positions. Column value is the value of the annotation.

a character vector

Examples

registry <- if (!check_pkg_registry_files()) use_tmp_registry() else get_pkg_registry()
Sys.setenv(CORPUS_REGISTRY = registry)

# pure R implementation (Rcpp implementation fails on Windows in vanilla mode)
b <- s_attribute_decode(
  data_dir = system.file(package = "RcppCWB", "extdata", "cwb", "indexed_corpora", "reuters"),
  s_attribute = "places", method = "R"
  )

RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

v0.3.2
GPL-3
Authors
Andreas Blaette [aut, cre], Bernard Desgraupes [aut], Sylvain Loiseau [aut], Oliver Christ [ctb], Bruno Maximilian Schulze [ctb], Stefan Evert [ctb], Arne Fitschen [ctb], Jeroen Ooms [ctb], Marius Bertram [ctb]
Initial release
2021-02-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.