Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

segment.optimizer

A function to optimize MSTTR segment sizes


Description

This function calculates an optimized segment size for MSTTR.

Usage

segment.optimizer(txtlgth, segment = 100, range = 20, favour.min = TRUE)

Arguments

txtlgth

Integer value, size of text in tokens.

segment

Integer value, start value of the segment size.

range

Integer value, range around segment to search for better fitting sizes.

favour.min

Logical, whether as a last ressort smaller or larger segment sizes should be prefered, if in doubt.

Details

When calculating the mean segmental type-token ratio (MSTTR), tokens are divided into segments of a given size and analyzed. If at the end text is left over which won't fill another full segment, it is discarded, i.e. information is lost. For interpretation it is debatable which is worse: Dropping more or less actual token material, or variance in segment size between analyzed texts. If you'd prefer the latter, this function might prove helpful.

Starting with a given text length, segment size and range to investigate, segment.optimizer iterates through possible segment values. It returns the segment size which would drop the fewest tokens (zero, if you're lucky). Should more than one value fulfill this demand, the one nearest to the segment start value is taken. In cases, where still two values are equally far away from the start value, it depends on the setting of favour.min if the smaller or larger segment size is returned.

Value

A numeric vector with two elements:

seg

The optimized segment size

drop

The number of tokens that would be dropped using this segment size

See Also

Examples

segment.optimizer(2014, favour.min=FALSE)

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

v0.13-6
GPL (>= 3)
Authors
Meik Michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Initial release
2021-05-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.