Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

about_search_coll

Locale-Sensitive Text Searching in stringi


Description

String searching facilities described here provide a way to locate a specific piece of text. Interestingly, locale-sensitive searching, especially on a non-English text, is a much more complex process than it seems at the first glance.

Locale-Aware String Search Engine

All stri_*_coll functions in stringi use ICU's StringSearch engine, which implements a locale-sensitive string search algorithm. The matches are defined by using the notion of “canonical equivalence” between strings.

Tuning the Collator's parameters allows you to perform correct matching that properly takes into account accented letters, conjoined letters, ignorable punctuation and letter case.

For more information on ICU's Collator and the search engine and how to tune it up in stringi, refer to stri_opts_collator.

Please note that ICU's StringSearch-based functions are often much slower that those to perform fixed pattern searches.

References

ICU String Search Service – ICU User Guide, http://userguide.icu-project.org/collation/icu-string-search-service

L. Werner, Efficient Text Searching in Java, 1999, https://icu-project.org/docs/papers/efficient_text_searching_in_java.html

See Also

Other search_coll: about_search, stri_opts_collator()


stringi

Character String Processing Facilities

v1.6.1
file LICENSE
Authors
Marek Gagolewski [aut, cre, cph] (<https://orcid.org/0000-0003-0637-6028>), Bartek Tartanus [ctb], and others (stringi source code); IBM, Unicode, Inc. and others (ICU4C source code, Unicode Character Database)
Initial release
2021-05-05

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.