koRpus: read.corp.celex – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

koRpus

read.corp.celex

Import Celex data

Description

Read data from Celex[1] formatted corpora.

Usage

read.corp.celex(
  celex.path,
  running.words,
  fileEncoding = "ISO_8859-1",
  n = -1,
  caseSens = TRUE
)

Arguments

`celex.path`	A character string, path to a frequency file in Celex format to read.
`running.words`	An integer value, number of running words in the Celex data corpus to be read.
`fileEncoding`	A character string naming the encoding of the Celex files.
`n`	An integer value defining how many lines of data should be read if `format="flatfile"`. Reads all at -1.
`caseSens`	Logical, if `FALSE` forces all frequency statistics to be calculated regardless of the tokens' case. Otherwise, if the imported database supports it, you will get different frequencies for the same tokens in different cases (e.\,g., "one" and "One").

Value

An object of class kRp.corp.freq.

References

[1] http://celex.mpi.nl

Examples

## Not run: 
my.Celex.data <- read.corp.celex(
  file.path("~","mydata","Celex","GERMAN","GFW","GFW.CD"),
  running.words=5952000
)
freq.analysis(
  tokenized.obj,
  corp.freq=my.Celex.data
)

## End(Not run)

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6

GPL (>= 3)

Authors

Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]

Initial release

2021-05-08

read.corp.celex

Description

Usage

Arguments

Value

References

See Also

Examples

koRpus

We don't support your browser anymore