Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

read.corp.custom-methods

Import custom corpus data


Description

Read data from a custom corpus into a valid object of class kRp.corp.freq.

Usage

read.corp.custom(corpus, caseSens = TRUE, log.base = 10, ...)

## S4 method for signature 'kRp.text'
read.corp.custom(
  corpus,
  caseSens = TRUE,
  log.base = 10,
  dtm = docTermMatrix(obj = corpus, case.sens = caseSens),
  as.feature = FALSE
)

Arguments

corpus

An object of class kRp.text (then the column "token" of the tokens slot is used).

caseSens

Logical. If FALSE, all tokens will be matched in their lower case form.

log.base

A numeric value defining the base of the logarithm used for inverse document frequency (idf). See log for details.

...

Additional options for methods of the generic.

dtm

A document term matrix of the corpus object as generated by docTermMatrix. This argument merely exists for cases where you want to re-use an already existing matrix. By default, it is being created from the corpus object.

as.feature

Logical, whether the output should be just the analysis results or the input object with the results added as a feature. Use corpusCorpFreq to get the results from such an aggregated object.

Details

The methods should enable you to perform a basic text corpus frequency analysis. That is, not just to import analysis results like LCC files, but to import the corpus material itself. The resulting object is of class kRp.corp.freq, so it can be used for frequency analysis by other functions and methods of this package.

Value

An object of class kRp.corp.freq.

Depending on as.feature, either an object of class kRp.corp.freq, or an object of class kRp.text with the added feature corp_freq containing it.

See Also

Examples

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  # call read.corp.custom() on a tokenized text
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  # if you call read.corp.custom() without arguments,
  # you will get its results directly
  en_corp <- read.corp.custom(
    tokenized.obj,
    caseSens=FALSE
  )

  # alternatively, you can also store those results as a
  # feature in the object itself
  tokenized.obj <- read.corp.custom(
    tokenized.obj,
    caseSens=FALSE,
    as.feature=TRUE
  )
  # results are now part of the object
  hasFeature(tokenized.obj)
  corpusCorpFreq(tokenized.obj)
} else {}

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6
GPL (>= 3)
Authors
Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.