Get Number of Tokens.
The method will get the number of tokens in a corpus or partition, or the dispersion across one or more s-attributes.
size(x, ...) ## S4 method for signature 'corpus' size(x, s_attribute = NULL, verbose = TRUE, ...) ## S4 method for signature 'character' size(x, s_attribute = NULL, verbose = TRUE, ...) ## S4 method for signature 'partition' size(x, s_attribute = NULL, ...) ## S4 method for signature 'partition_bundle' size(x) ## S4 method for signature 'DocumentTermMatrix' size(x) ## S4 method for signature 'TermDocumentMatrix' size(x) ## S4 method for signature 'features' size(x) ## S4 method for signature 'remote_corpus' size(x) ## S4 method for signature 'remote_partition' size(x)
x |
An object to get size(s) for. |
... |
Further arguments (used only for backwards compatibility). |
s_attribute |
A |
verbose |
A |
One or more s-attributes can be provided to get the dispersion of tokens across one or more dimensions. Two or more s-attributes can lead to reasonable results only if the corpus XML is flat.
The size
-method for features
objects will return a
named list with the size of the corpus of interest ("coi"), i.e. the number
of tokens in the window, and the reference corpus ("ref"), i.e. the number
of tokens that are not matched by the query and that are outside the
window.
If .Object
is a corpus (a corpus
object or specified by
corpus id), an integer
vector if argument s_attribute
is
NULL
, a two-column data.table
otherwise (first column is the
s-attribute, second column: "size"). If .Object
is a
subcorpus_bundle
or a partition_bundle
, a data.table
(with columns "name" and "size").
See dispersion
-method for counts of hits. The hits
method calls the size
-method to get sizes of subcorpora.
use("polmineR") # for corpus object corpus("REUTERS") %>% size() corpus("REUTERS") %>% size(s_attribute = "id") corpus("GERMAPARLMINI") %>% size(s_attribute = c("date", "party")) # for corpus specified by ID size("GERMAPARLMINI") size("GERMAPARLMINI", s_attribute = "date") size("GERMAPARLMINI", s_attribute = c("date", "party")) # for partition object P <- partition("GERMAPARLMINI", date = "2009-11-11") size(P, s_attribute = "speaker") size(P, s_attribute = "party") size(P, s_attribute = c("speaker", "party")) # for subcorpus sc <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-11") size(sc, s_attribute = "speaker") size(sc, s_attribute = "party") size(sc, s_attribute = c("speaker", "party")) # for subcorpus_bundle subcorpora <- corpus("GERMAPARLMINI") %>% split(s_attribute = "date") size(subcorpora)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.