Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cTest-methods

Transform text into C-Test-like format


Description

If you feed a tagged text object to this function, its text will be transformed into a format used for C-Tests:

  • the first and last sentence will be left untouched (except if the start and stop values of the intact parameter are changed

  • of all other sentences, the second half of every 2nd word (or as specified by every) will be replaced by a line

  • words must have at least min.length characters, otherwise they are skipped

  • words an uneven number of characters will be replaced after the next character, i.e., a word with five characters will keep the first three and have the last two replaced

Usage

cTest(obj, ...)

## S4 method for signature 'kRp.text'
cTest(
  obj,
  every = 2,
  min.length = 3,
  intact = c(start = 1, end = 1),
  replace.by = "_"
)

Arguments

obj

An object of class kRp.text.

...

Additional arguments to the method (as described in this document).

every

Integer numeric, setting the frequency of words to be manipulated. By default, every other word is being transformed.

min.length

Integer numeric, sets the minimum length of words to be considered (in letters).

intact

Named vector with the elements start and end. both must be integer values and define, which sentences are to be left untouched, counted in sentences from beginning and end of the text. The default is to ignore the first and last sentence.

replace.by

Character, will be used as the replacement for the removed word halves.

Value

An object of class kRp.text with the added feature diff.

Examples

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  tokenized.obj <- cTest(tokenized.obj)
  pasteText(tokenized.obj)

  # diff stats are now part of the object
  hasFeature(tokenized.obj)
  diffText(tokenized.obj)
} else {}

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6
GPL (>= 3)
Authors
Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.