Subsetting corpora and subcorpora
The structural attributes of a corpus (s-attributes) can be used
to generate subcorpora (i.e. a subcorpus class object) by applying
the subset-method. To obtain a subcorpus, the
subset-method can be applied on a corpus represented by a
corpus object, a length-one character vector (as a shortcut),
and on a subcorpus object.
## S4 method for signature 'corpus' subset(x, subset, regex = FALSE, ...) ## S4 method for signature 'character' subset(x, ...) ## S4 method for signature 'subcorpus' subset(x, subset, ...) ## S4 method for signature 'remote_corpus' subset(x, subset)
x |
A |
subset |
A |
regex |
A |
... |
An expression that will be used to create a subcorpus from s-attributes. |
The methods applicable for the subcorpus object resulting
from subsetting a corpus or subcorpus are described in the documentation of
the subcorpus-class. Note that the subset-method can also be
applied to textstat-class objects (and objects inheriting from
this class).
use("polmineR")
# examples for standard and non-standard evaluation
a <- corpus("GERMAPARLMINI")
# subsetting a corpus object using non-standard evaluation
sc <- subset(a, speaker == "Angela Dorothea Merkel")
sc <- subset(a, speaker == "Angela Dorothea Merkel" & date == "2009-10-28")
sc <- subset(a, grepl("Merkel", speaker))
sc <- subset(a, grepl("Merkel", speaker) & date == "2009-10-28")
# subsetting corpus specified by character vector
sc <- subset("GERMAPARLMINI", grepl("Merkel", speaker))
sc <- subset("GERMAPARLMINI", speaker == "Angela Dorothea Merkel")
sc <- subset("GERMAPARLMINI", speaker == "Angela Dorothea Merkel" & date == "2009-10-28")
sc <- subset("GERMAPARLMINI", grepl("Merkel", speaker) & date == "2009-10-28")
# subsetting a corpus using the (old) logic of the partition-method
sc <- subset(a, speaker = "Angela Dorothea Merkel")
sc <- subset(a, speaker = "Angela Dorothea Merkel", date = "2009-10-28")
sc <- subset(a, speaker = "Merkel", regex = TRUE)
sc <- subset(a, speaker = c("Merkel", "Kauder"), regex = TRUE)
sc <- subset(a, speaker = "Merkel", date = "2009-10-28", regex = TRUE)
# providing the value for s-attribute as a variable
who <- "Volker Kauder"
sc <- subset(a, quote(speaker == who))
# use bquote for quasiquotation when using a variable for subsetting in a loop
for (who in c("Angela Dorothea Merkel", "Volker Kauder", "Ronald Pofalla")){
sc <- subset(a, bquote(speaker == .(who)))
if (interactive()) print(size(sc))
}
# equivalent procedure with lapply (DOES NOT WORK YET)
b <- lapply(
c("Angela Dorothea Merkel", "Volker Kauder", "Ronald Pofalla"),
function(who) subset(a, bquote(speaker == .(who)))
)
sapply(b, size)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.