Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

read.corp.celex

Import Celex data


Description

Read data from Celex[1] formatted corpora.

Usage

read.corp.celex(
  celex.path,
  running.words,
  fileEncoding = "ISO_8859-1",
  n = -1,
  caseSens = TRUE
)

Arguments

celex.path

A character string, path to a frequency file in Celex format to read.

running.words

An integer value, number of running words in the Celex data corpus to be read.

fileEncoding

A character string naming the encoding of the Celex files.

n

An integer value defining how many lines of data should be read if format="flatfile". Reads all at -1.

caseSens

Logical, if FALSE forces all frequency statistics to be calculated regardless of the tokens' case. Otherwise, if the imported database supports it, you will get different frequencies for the same tokens in different cases (e.\,g., "one" and "One").

Value

An object of class kRp.corp.freq.

References

See Also

Examples

## Not run: 
my.Celex.data <- read.corp.celex(
  file.path("~","mydata","Celex","GERMAN","GFW","GFW.CD"),
  running.words=5952000
)
freq.analysis(
  tokenized.obj,
  corp.freq=my.Celex.data
)

## End(Not run)

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6
GPL (>= 3)
Authors
Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.