S4 kwic class
S4 class for organizing information for kwic/concordance output. A set of
standard generics (show
, as.character
, as.data.frame
,
length
, sample
, subset
) as well as indexing is implemented to process
kwic class objects (see 'Usage'). See section 'Details' for the
enrich
, view
and knit_print
methods.
## S4 method for signature 'kwic' get_corpus(x) ## S4 method for signature 'kwic' count(.Object, p_attribute = "word") ## S4 method for signature 'kwic' as.DocumentTermMatrix(x, p_attribute, verbose = TRUE, ...) ## S4 method for signature 'kwic' as.TermDocumentMatrix(x, p_attribute, verbose = TRUE, ...) ## S4 method for signature 'kwic' show(object) ## S4 method for signature 'kwic' knit_print(x, options = knitr::opts_chunk) ## S4 method for signature 'kwic' as.character(x, fmt = "<i>%s</i>") ## S4 method for signature 'kwic,ANY,ANY,ANY' x[i] ## S4 method for signature 'kwic' subset(x, ...) ## S4 method for signature 'kwic' as.data.frame(x) ## S4 method for signature 'kwic' length(x) ## S4 method for signature 'kwic' sample(x, size) ## S4 method for signature 'kwic_bundle' merge(x) ## S4 method for signature 'kwic' enrich(.Object, s_attributes = NULL, extra = NULL, table = FALSE, ...) ## S4 method for signature 'kwic' format( x, node_color = "blue", align = TRUE, extra_color = "grey", lineview = getOption("polmineR.lineview") ) ## S4 method for signature 'kwic' view(.Object)
x |
A |
.Object |
A |
p_attribute |
A length-one |
verbose |
A |
... |
Used for backwards compatibility. |
object |
A |
options |
Chunk options. |
fmt |
A format string passed into |
i |
Single integer value, the kwic line for which the fulltext shall be inspected. |
size |
An |
s_attributes |
Character vector of s-attributes with metainformation. |
extra |
An |
table |
Logical, whether to turn cpos |
node_color |
If not |
align |
A |
extra_color |
If extra context has been generated using |
lineview |
A |
Applying the count
-method on a kwic
object will return
a count
object with the evaluation of the left and right context of
the match.
The knit_print
method will be called by knitr to render
'kwic' objects as a DataTable htmlwidget
when rendering a R
Markdown document as html. It will usually be necessary to explicitly state
"render = knit_print" in the chunk options. The option
'polmineR.pagelength' controls the number of lines displayed in the
resulting 'htmlwidget'. Note that including htmlwidgets in html documents
requires that pandoc is installed. To avoid an error, a formatted
data.table
is returned by knit_print
if pandoc is not
available.
The as.character
-method will return a list of
character
vectors, concatenating the columns "left", "node" and
"right" of the data.table
in the stat
-slot of the input
kwic
-class object. Optionally, the node can be formatted using a
format string that is passed into sprintf
.
The subset
-method will apply subset
to the table in
the slot stat
, e.g. for filtering query results based on metadata (i.e.
s-attributes) that need to be present.
The enrich
method is used to generate the actual output for
the kwic method. If param table
is TRUE
, corpus positions
will be turned into a data.frame
with the concordance lines. If
param s_attributes
is a character vector with s-attributes, the
respective s-attributes will be added as columns to the table with
concordance lines.
The format
-method will return a data.table
that can
serve as input for rendering a htmlwidget
, for instance using
DT::datatable
or rhandsontable::rhandsontable
. It will
include html tags, so ensure that the rendering engine does not obfuscate
the html.
metadata
A character
vector with s-attributes of the metadata
that are to be displayed.
p_attribute
The p-attribute for which the context has been generated.
left
An integer
value, words to the left of the query match.
right
An integer
value, words to the right of the query match.
corpus
Length-one character
vector, the CWB corpus.
cpos
A data.table
with the columns "match_id", "cpos", "position",
"word_id", "word" and "direction".
stat
A data.table
, a table with columns "left", "node",
"right", and metadata, if the object has been enriched.
encoding
A length-one character
vector with the encoding of the
corpus.
name
A length-one character
vector naming the object.
annotation_cols
A character
vector designating the columns of
the data.table
in the slot table
that are annotations.
The constructor for generating kwic objects is the
kwic
method.
use("polmineR") K <- kwic("GERMAPARLMINI", "Integration") get_corpus(K) length(K) K_min <- K[1] K_min <- K[1:5] # using kwic_bundle class queries <- c("oil", "prices", "barrel") li <- lapply(queries, function(q) kwic("REUTERS", query = q)) kb <- as.bundle(li) # use count-method on kwic object coi <- kwic("REUTERS", query = "oil") %>% count(p_attribute = "word") # features vs cooccurrences-method (identical results) ref <- count("REUTERS", p_attribute = "word") %>% subset(word != "oil") slot(ref, "size") <- slot(ref, "size") - count("REUTERS", "oil")[["count"]] y_features <- features(coi, ref, method = "ll", included = TRUE) y_cooc <- cooccurrences("REUTERS", query = "oil") # extract node and left and right context as character vectors oil <- kwic("REUTERS", query = "oil") as.character(oil, fmt = NULL) as.character(oil) # node wrapped into <i> tag by default as.character(oil, fmt = "<b>%s</b>") # subsetting kwic objects oil <- corpus("REUTERS") %>% kwic(query = "oil") %>% subset(grepl("prices", right)) saudi_arabia <- corpus("REUTERS") %>% kwic(query = "Arabia") %>% subset(grepl("Saudi", left)) int_spd <- corpus("GERMAPARLMINI") %>% kwic(query = "Integration") %>% enrich(s_attribute = "party") %>% subset(grepl("SPD", party)) # turn kwic object into data.frame with html tags int <- corpus("GERMAPARLMINI") %>% kwic(query = "Integration") as.data.frame(int) # Without further metadata enrich(int, s_attributes = c("date", "speaker", "party")) %>% as.data.frame() # merge bundle of kwic objects into one kwic reuters <- corpus("REUTERS") queries <- c('"Saudi" "Arabia"', "oil", '"barrel.*"') comb <- lapply(queries, function(qu) kwic(reuters, query = qu)) %>% as.bundle() %>% merge() # enrich kwic object i <- corpus("GERMAPARLMINI") %>% kwic(query = "Integration") %>% enrich(s_attributes = c("date", "speaker", "party"))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.