Get features by comparison.
The features of two objects, usually a partition defining a corpus of
interest (coi), and a partition defining a reference corpus (ref) are compared.
The most important purpose is term extraction.
features(x, y, ...)
## S4 method for signature 'partition'
features(x, y, included = FALSE, method = "chisquare", verbose = FALSE)
## S4 method for signature 'count'
features(
x,
y,
by = NULL,
included = FALSE,
method = "chisquare",
verbose = TRUE
)
## S4 method for signature 'partition_bundle'
features(
x,
y,
included = FALSE,
method = "chisquare",
verbose = TRUE,
mc = getOption("polmineR.mc"),
progress = FALSE
)
## S4 method for signature 'count_bundle'
features(
x,
y,
included = FALSE,
method = "chisquare",
verbose = !progress,
mc = getOption("polmineR.mc"),
progress = FALSE
)
## S4 method for signature 'ngrams'
features(x, y, included = FALSE, method = "chisquare", verbose = TRUE, ...)
## S4 method for signature 'Cooccurrences'
features(x, y, included = FALSE, method = "ll", verbose = TRUE)x |
A |
y |
A |
... |
further parameters |
included |
TRUE if coi is part of ref, defaults to FALSE |
method |
the statistical test to apply (chisquare or log likelihood) |
verbose |
A |
by |
the columns used for merging, if NULL (default), the p-attribute of x will be used |
mc |
logical, whether to use multicore |
progress |
logical |
Andreas Blaette
Baker, Paul (2006): Using Corpora in Discourse Analysis. London: continuum, p. 121-149 (ch. 6).
Manning, Christopher D.; Schuetze, Hinrich (1999): Foundations of Statistical Natural Language Processing. MIT Press: Cambridge, Mass., pp. 151-189 (ch. 5).
use("polmineR")
kauder <- partition(
"GERMAPARLMINI",
speaker = "Volker Kauder", interjection = "speech",
p_attribute = "word"
)
all <- partition("GERMAPARLMINI", interjection = "speech", p_attribute = "word")
terms_kauder <- features(x = kauder, y = all, included = TRUE)
top100 <- subset(terms_kauder, rank_chisquare <= 100)
head(top100)
# a different way is to compare count objects
kauder_count <- as(kauder, "count")
all_count <- as(all, "count")
terms_kauder <- features(kauder_count, all_count, included = TRUE)
top100 <- subset(terms_kauder, rank_chisquare <= 100)
head(top100)
speakers <- partition_bundle("GERMAPARLMINI", s_attribute = "speaker")
speakers <- enrich(speakers, p_attribute = "word")
speaker_terms <- features(speakers[[1:5]], all, included = TRUE, progress = TRUE)
dtm <- as.DocumentTermMatrix(speaker_terms, col = "chisquare")
# Get features of objects in a count_bundle
ref <- corpus("GERMAPARLMINI") %>% count(p_attribute = "word")
cois <- corpus("GERMAPARLMINI") %>%
subset(speaker %in% c("Angela Dorothea Merkel", "Hubertus Heil")) %>%
split(s_attribute = "speaker") %>%
count(p_attribute = "word")
y <- features(cois, ref, included = TRUE, method = "chisquare", progress = TRUE)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.