stringi: stri_unique – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

stringi

stri_unique

Extract Unique Elements

Description

This function returns a character vector like str, but with duplicate elements removed.

Usage

stri_unique(str, ..., opts_collator = NULL)

Arguments

`str`	a character vector
`...`	additional settings for `opts_collator`
`opts_collator`	a named list with ICU Collator's options, see `stri_opts_collator`, `NULL` for default collation options

Details

As usual in stringi, no attributes are copied. Unlike unique, this function tests for canonical equivalence of strings (and not whether the strings are just bytewise equal). Such an operation is locale-dependent. Hence, stri_unique is significantly slower (but much better suited for natural language processing) than its base R counterpart.

See also stri_duplicated for indicating non-unique elements.

Value

Returns a character vector.

References

Collation - ICU User Guide, http://userguide.icu-project.org/collation

Examples

# normalized and non-Unicode-normalized version of the same code point:
stri_unique(c('\u0105', stri_trans_nfkd('\u0105')))
unique(c('\u0105', stri_trans_nfkd('\u0105')))

stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)

stringi

Character String Processing Facilities

v1.6.1

file LICENSE

Authors

Marek Gagolewski [aut, cre, cph] (<https://orcid.org/0000-0003-0637-6028>), Bartek Tartanus [ctb], and others (stringi source code); IBM, Unicode, Inc. and others (ICU4C source code, Unicode Character Database)

Initial release

2021-05-05

stri_unique

Description

Usage

Arguments

Details

Value

References

See Also

Examples

stringi

We don't support your browser anymore