S4 textstat superclass.
The textstat
-class (technically an S4 class) serves as a superclass
for the classes features
, context
, and partition
.
Usually, the class will not be used directly. It offers a set of standard
generic methods (such as head
, tail
, dim
, nrow
,
colnames
) its childs inherit. The core feature of textstat
and
its childs is a data.table
in the slot stat
for keeping data on
text statistics of a corpus, or a partition
.
## S4 method for signature 'textstat' name(x) ## S4 method for signature 'character' name(x) ## S4 replacement method for signature 'textstat' name(x) <- value ## S4 method for signature 'textstat' round(x, digits = 2L) ## S4 method for signature 'textstat' sort(x, by, decreasing = TRUE) as.bundle(object, ...) ## S4 method for signature 'textstat,textstat' e1 + e2 ## S4 method for signature 'textstat' subset(x, subset) ## S3 method for class 'textstat' as.data.table(x, ...) ## S4 method for signature 'textstat' show(object) ## S4 method for signature 'textstat' p_attributes(.Object) ## S4 method for signature 'textstat' knit_print(x, options = knitr::opts_chunk, ...) ## S4 method for signature 'textstat' get_corpus(x) ## S4 method for signature 'textstat' format(x, digits = 2L) ## S4 method for signature 'textstat' view(.Object)
x |
A |
value |
A |
digits |
Number of digits. |
by |
Column that will serve as the key for sorting. |
decreasing |
Logical, whether to return decreasing order. |
object |
a textstat object |
... |
Argument that will be passed into a call of the |
e1 |
A |
e2 |
Another |
subset |
A logical expression indicating elements or rows to keep. |
.Object |
A |
options |
Chunk options. |
A head
-method will return the first rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
A tail
-method will return the last rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
The methods dim
, nrow
and ncol
will return information
on the dimensions, the number of rows, or the number of columns of the
data.table
in the stat
-slot, respectively.
Objects derived from the textstat
class can be indexed with simple
square brackets ("[") to get rows specified by an numeric/integer vector,
and with double square brackets ("[[") to get specific columns from the
data.table
in the slot stat
.
The colnames
-method will return the column names of the data-table
in the slot stat
.
The methods as.data.table
, and as.data.frame
will extract the
data.table
in the slot stat
as a data.table
, or
data.frame
, respectively.
textstat
objects can have a name, which can be retrieved, and set using
the name
-method and name<-
, respectively.
The round()
-method looks up all numeric columns in the
data.table
in the stat
-slot of the textstat
object and
rounds values of these columns to the number of decimal places specified by
argument digits
.
The knit_print
method will be called by knitr to render
'textstat' objects or objects inheriting from the 'textstat' class as a
DataTable htmlwidget
when rendering a R Markdown document as html.
It will usually be necessary to explicitly state "render = knit_print" in
the chunk options. The option 'polmineR.pagelength' controls the number of
lines displayed in the resulting 'htmlwidget'. Note that including
htmlwidgets in html documents requires that pandoc is installed. To avoid
an error, a formatted data.table
is returned by knit_print
if
pandoc is not available.
The format()
-method returns a pretty-printed and minimized
version of the data.table
in the stat
-slot of the
textstat
-object: It will round all numeric columns to the number of decimal
numbers specified by digits
, and drop all columns with token ids. The
return value is a data.table
.
p_attribute
Object of class character
, p-attribute of the query.
corpus
A corpus specified by a length-one character
vector.
stat
A data.table
with statistical information.
name
The name of the object.
annotation_cols
A character
vector, column names of
data.table
in slot stat
that are annotations.
encoding
A length-one character
vector, the encoding of the corpus.
use("polmineR") P <- partition("GERMAPARLMINI", date = ".*", p_attribute = "word", regex = TRUE) y <- cooccurrences(P, query = "Arbeit") # generics defined in the polmineR package x <- count("REUTERS", p_attribute = "word") name(x) <- "count_reuters" name(x) get_corpus(x) # Standard generic methods known from data.frames work for objects inheriting # from the textstat class head(y) tail(y) nrow(y) ncol(y) dim(y) colnames(y) # Use brackets for indexing ## Not run: y[1:25] y[,c("word", "ll")] y[1:25, "word"] y[1:25][["word"]] y[which(y[["word"]] %in% c("Arbeit", "Sozial"))] y[ y[["word"]] %in% c("Arbeit", "Sozial") ] ## End(Not run) sc <- partition("GERMAPARLMINI", speaker = "Angela Dorothea Merkel") cnt <- count(sc, p_attribute = c("word", "pos")) cnt_min <- subset(cnt, pos %in% c("NN", "ADJA")) cnt_min <- subset(cnt, pos == "NE") # Get statistics in textstat object as data.table count_dt <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count(p_attribute = "word") %>% as.data.table()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.