Get corpus positions for a query or queries.
Get matches for a query in a CQP corpus (subcorpus, partition etc.), optionally using the CQP syntax of the Corpus Workbench (CWB).
cpos(.Object, ...) ## S4 method for signature 'corpus' cpos( .Object, query, p_attribute = getOption("polmineR.p_attribute"), cqp = is.cqp, regex = FALSE, check = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'character' cpos( .Object, query, p_attribute = getOption("polmineR.p_attribute"), cqp = is.cqp, check = TRUE, verbose = TRUE, ... ) ## S4 method for signature 'slice' cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) ## S4 method for signature 'partition' cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) ## S4 method for signature 'subcorpus' cpos( .Object, query, cqp = is.cqp, check = TRUE, p_attribute = getOption("polmineR.p_attribute"), verbose = TRUE, ... ) ## S4 method for signature 'matrix' cpos(.Object) ## S4 method for signature 'hits' cpos(.Object) ## S4 method for signature ''NULL'' cpos(.Object)
.Object |
A length-one |
... |
Used for reasons of backwards compatibility to
process arguments that have been renamed (e.g. |
query |
A |
p_attribute |
The p-attribute to search. Needs to be stated only if query
is not a CQP query. Defaults to |
cqp |
Either logical ( |
regex |
Interpret |
check |
A |
verbose |
A |
If the cpos()
-method is applied on a character
or
partition
object, the result is a two-column matrix
with the
regions (start end end corpus positions of the matches) for a query. CQP
syntax can be used. The encoding of the query is adjusted to conform to the
encoding of the CWB corpus. If there are not matches, NULL
is
returned.
If the cpos()
-method is called on a matrix
object, the cpos
matrix is unfolded, the return value is an integer vector with the individual
corpus positions.
If .Object
is a hits
object, an integer
vector is
returned with the individual corpus positions.
. If .Object
is a matrix
, it is assumed to be a region
matrix, i.e. a two-column matrix
with left and right corpus positions
in the first and second row, respectively. For many operations, such as
decoding the token stream, it is necessary to inflate the denoted regions
into a vector of all corpus positions referred to by the regions defined in
the matrix. The cpos
-method for matrix
objects will performs
this task robustly.
If .Object
is NULL
, the method will return an empty
integer vector. Used internally to handle NULL
objects that may be
returned from the cpos
-method if no matches are obtained for a
query.
Unless .Object
is a matrix
, the return value is a
matrix
with two columns. The first column reports the left/starting
corpus positions (cpos) of the hits obtained. The second column reports the
right/ending corpus positions of the respective hit. The number of rows is
the number of hits. If there are no hits, a NULL
object is returned.
use("polmineR") # looking up single tokens cpos("REUTERS", query = "oil") corpus("REUTERS") %>% cpos(query = "oil") corpus("REUTERS") %>% subset(grepl("saudi-arabia", places)) %>% cpos(query = "oil") partition("REUTERS", places = "saudi-arabia", regex = TRUE) %>% cpos(query = "oil") # using CQP query syntax cpos("REUTERS", query = '"Saudi" "Arabia"') corpus("REUTERS") %>% cpos(query = '"Saudi" "Arabia"') corpus("REUTERS") %>% subset(grepl("saudi-arabia", places)) %>% cpos(query = '"Saudi" "Arabia"', cqp = TRUE) partition("REUTERS", places = "saudi-arabia", regex = TRUE) %>% cpos(query = '"Saudi" "Arabia"', cqp = TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.