Extract text features for authorship analysis
This function combines several of koRpus
' methods to extract the 9-Feature Set for
authorship detection (Brannon, Afroz & Greenstadt, 2011; Brannon & Greenstadt, 2009).
textFeatures(text, hyphen = NULL)
text |
An object of class |
hyphen |
An object of class |
A data.frame:
Number of unique words (tokens)
Complexity (TTR)
Sentence count
Average sentence length
Average syllable count
Character count (all characters, including spaces)
Letter count (without spaces, punctuation and digits)
Gunning FOG index
Flesch Reading Ease index
Brennan, M., Afroz, S., & Greenstadt, R. (2011). Deceiving authorship detection. Presentation at 28th Chaos Communication Congress (28C3), Berlin, Germany. Brennan, M. & Greenstadt, R. (2009). Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA. Tweedie, F.J., Singh, S., & Holmes, D.I. (1996). Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities, 30, 1–10.
# code is only run when the english language package can be loaded if(require("koRpus.lang.en", quietly = TRUE)){ sample_file <- file.path( path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt" ) tokenized.obj <- tokenize( txt=sample_file, lang="en" ) textFeatures(tokenized.obj) } else {}
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.