Locale-Sensitive Text Searching in stringi
String searching facilities described here provide a way to locate a specific piece of text. Interestingly, locale-sensitive searching, especially on a non-English text, is a much more complex process than it seems at the first glance.
All stri_*_coll
functions in stringi use
ICU's StringSearch
engine,
which implements a locale-sensitive string search algorithm.
The matches are defined by using the notion of “canonical equivalence”
between strings.
Tuning the Collator's parameters allows you to perform correct matching that properly takes into account accented letters, conjoined letters, ignorable punctuation and letter case.
For more information on ICU's Collator and the search engine
and how to tune it up
in stringi, refer to stri_opts_collator
.
Please note that ICU's StringSearch
-based functions
are often much slower that those to perform fixed pattern searches.
ICU String Search Service – ICU User Guide, http://userguide.icu-project.org/collation/icu-string-search-service
L. Werner, Efficient Text Searching in Java, 1999, https://icu-project.org/docs/papers/efficient_text_searching_in_java.html
Other search_coll:
about_search
,
stri_opts_collator()
Other locale_sensitive:
%s<%()
,
about_locale
,
about_search_boundaries
,
stri_compare()
,
stri_count_boundaries()
,
stri_duplicated()
,
stri_enc_detect2()
,
stri_extract_all_boundaries()
,
stri_locate_all_boundaries()
,
stri_opts_collator()
,
stri_order()
,
stri_rank()
,
stri_sort_key()
,
stri_sort()
,
stri_split_boundaries()
,
stri_trans_tolower()
,
stri_unique()
,
stri_wrap()
Other stringi_general_topics:
about_arguments
,
about_encoding
,
about_locale
,
about_search_boundaries
,
about_search_charclass
,
about_search_fixed
,
about_search_regex
,
about_search
,
about_stringi
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.