Automatic hyphenation
These methods implement word hyphenation, based on Liang's algorithm.
For details, please refer to the documentation for the generic
hyphen
method in the sylly
package.
## S4 method for signature 'kRp.text' hyphen( words, hyph.pattern = NULL, min.length = 4, rm.hyph = TRUE, corp.rm.class = "nonpunct", corp.rm.tag = c(), quiet = FALSE, cache = TRUE, as = "kRp.hyphen", as.feature = FALSE ) ## S4 method for signature 'kRp.text' hyphen_df( words, hyph.pattern = NULL, min.length = 4, rm.hyph = TRUE, quiet = FALSE, cache = TRUE ) ## S4 method for signature 'kRp.text' hyphen_c( words, hyph.pattern = NULL, min.length = 4, rm.hyph = TRUE, quiet = FALSE, cache = TRUE )
words |
Either an object of class |
hyph.pattern |
Either an object of class |
min.length |
Integer,
number of letters a word must have for considering a hyphenation. |
rm.hyph |
Logical, whether appearing hyphens in words should be removed before pattern matching. |
corp.rm.class |
A character vector with word classes which should be ignored. The default value
|
corp.rm.tag |
A character vector with POS tags which should be ignored. Relevant only if |
quiet |
Logical. If |
cache |
Logical. |
as |
A character string defining the class of the object to be returned. Defaults to |
as.feature |
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use |
An object of class kRp.text
,
kRp.hyphen
,
data.frame
or a numeric vector,
depending on the values of the as
and as.feature
arguments.
Liang, F.M. (1983). Word Hy-phen-a-tion by Com-put-er. Dissertation, Stanford University, Dept. of Computer Science.
# code is only run when the english language package can be loaded if(require("koRpus.lang.en", quietly = TRUE)){ sample_file <- file.path( path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt" ) # call hyphen on a given english word # "quiet=TRUE" suppresses the progress bar hyphen( "interference", hyph.pattern="en", quiet=TRUE ) # call hyphen() on a tokenized text tokenized.obj <- tokenize( txt=sample_file, lang="en" ) # language definition is defined in the object # if you call hyphen() without arguments, # you will get its results directly hyphen(tokenized.obj) # alternatively, you can also store those results as a # feature in the object itself tokenized.obj <- hyphen( tokenized.obj, as.feature=TRUE ) # results are now part of the object hasFeature(tokenized.obj) corpusHyphen(tokenized.obj) } else {}
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.