Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

stri_width

Determine the Width of Code Points


Description

Approximates the number of text columns the 'cat()' function should use to print a string with a mono-spaced font.

Usage

stri_width(str)

Arguments

str

character vector or an object coercible to

Details

The Unicode standard does not formalize the notion of a character width. Roughly based on https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c, https://github.com/nodejs/node/blob/master/src/node_i18n.cc, and UAX #11 we proceed as follows. The following code points are of width 0:

  • code points with general category (see stringi-search-charclass) Me, Mn, and Cf),

  • C0 and C1 control codes (general category Cc) - for compatibility with the nchar function,

  • Hangul Jamo medial vowels and final consonants (code points with enumerable property UCHAR_HANGUL_SYLLABLE_TYPE equal to U_HST_VOWEL_JAMO or U_HST_TRAILING_JAMO; note that applying the NFC normalization with stri_trans_nfc is encouraged),

  • ZERO WIDTH SPACE (U+200B),

Characters with the UCHAR_EAST_ASIAN_WIDTH enumerable property equal to U_EA_FULLWIDTH or U_EA_WIDE are of width 2.

Most emojis and characters with general category So (other symbols) are of width 2.

SOFT HYPHEN (U+00AD) (for compatibility with nchar) as well as any other characters have width 1.

Value

Returns an integer vector of the same length as str.

References

East Asian Width – Unicode Standard Annex #11, https://www.unicode.org/reports/tr11/

See Also

Other length: stri_isempty(), stri_length(), stri_numbytes()

Examples

stri_width(LETTERS[1:5])
stri_width(stri_trans_nfkd('\u0105'))
stri_width(stri_trans_nfkd('\U0001F606'))
stri_width( # Full-width equivalents of ASCII characters:
   stri_enc_fromutf32(as.list(c(0x3000, 0xFF01:0xFF5E)))
)
stri_width(stri_trans_nfkd('\ubc1f')) # includes Hangul Jamo medial vowels and final consonants

stringi

Character String Processing Facilities

v1.6.1
file LICENSE
Authors
Marek Gagolewski [aut, cre, cph] (<https://orcid.org/0000-0003-0637-6028>), Bartek Tartanus [ctb], and others (stringi source code); IBM, Unicode, Inc. and others (ICU4C source code, Unicode Character Database)
Initial release
2021-05-05

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.