polmineR: pmi – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

polmineR

pmi

Calculate Pointwise Mutual Information (PMI).

Description

Calculate Pointwise Mutual Information as an information-theoretic approach to find collocations.

Usage

pmi(.Object, ...)

## S4 method for signature 'context'
pmi(.Object)

## S4 method for signature 'Cooccurrences'
pmi(.Object)

## S4 method for signature 'ngrams'
pmi(.Object, observed, p_attribute = p_attributes(.Object)[1])

Arguments

`.Object`	An object.
`...`	Arguments methods may require.
`observed`	A `count`-object with the numbers of the observed occurrences of the tokens in the input `ngrams` object.
`p_attribute`	The positional attribute which shall be considered. Relevant only if ngrams have been calculated for more than one p-attribute.

Details

Pointwise mutual information (PMI) is calculated as follows (see Manning/Schuetze 1999):

I(x,y) = log(p(x,y)/(p(x)p(y)))

The formula is based on maximum likelihood estimates: When we know the number of observations for token x, o(x), the number of observations for token y, o(y) and the size of the corpus N, the propabilities for the tokens x and y, and for the co-occcurence of x and y are as follows:

p(x) = o(x) / N

p(y) = o(y) / N

The term p(x,y) is the number of observed co-occurrences of x and y.

Note that the computation uses log base 2, not the natural logarithm you find in examples (e.g. https://en.wikipedia.org/wiki/Pointwise_mutual_information).

References

Manning, Christopher D.; Schuetze, Hinrich (1999): Foundations of Statistical Natural Language Processing. MIT Press: Cambridge, Mass., pp. 178-183.

Examples

y <- cooccurrences("REUTERS", query = "oil", method = "pmi")
N <- size(y)[["partition"]]
I <- log2((y[["count_coi"]]/N) / ((count(y) / N) * (y[["count_partition"]] / N)))
use("polmineR")
dt <- decode(
  "REUTERS",
  p_attribute = "word",
  s_attribute = character(), 
  to = "data.table",
  verbose = FALSE
)
n <- ngrams(dt, n = 2L, p_attribute = "word")
obs <- count("REUTERS", p_attribute = "word")
phrases <- pmi(n, observed = obs)

polmineR

Verbs and Nouns for Corpus Analysis

v0.8.5

GPL-3

Authors

Andreas Blaette [aut, cre] (<https://orcid.org/0000-0001-8970-8010>), Christoph Leonhardt [ctb]

Initial release

2020-09-22

pmi

Description

Usage

Arguments

Details

References

See Also

Examples

polmineR

We don't support your browser anymore