Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

as_utf8_character

Coerce to a character vector and attempt encoding conversion


Description

Experimental lifecycle

Unlike specifying the encoding argument in as_string() and as_character(), which is only declarative, these functions actually attempt to convert the encoding of their input. There are two possible cases:

  • The string is tagged as UTF-8 or latin1, the only two encodings for which R has specific support. In this case, converting to the same encoding is a no-op, and converting to native always works as expected, as long as the native encoding, the one specified by the LC_CTYPE locale has support for all characters occurring in the strings. Unrepresentable characters are serialised as unicode points: "<U+xxxx>".

  • The string is not tagged. R assumes that it is encoded in the native encoding. Conversion to native is a no-op, and conversion to UTF-8 should work as long as the string is actually encoded in the locale codeset.

When translating to UTF-8, the strings are parsed for serialised unicode points (e.g. strings looking like "U+xxxx") with chr_unserialise_unicode(). This helps to alleviate the effects of character-to-symbol-to-character roundtrips on systems with non-UTF-8 native encoding.

Usage

as_utf8_character(x)

Arguments

x

An object to coerce.

Examples

# Let's create a string marked as UTF-8 (which is guaranteed by the
# Unicode escaping in the string):
utf8 <- "caf\uE9"
Encoding(utf8)
as_bytes(utf8)

rlang

Functions for Base Types and Core R and 'Tidyverse' Features

v0.4.11
MIT + file LICENSE
Authors
Lionel Henry [aut, cre], Hadley Wickham [aut], mikefc [cph] (Hash implementation based on Mike's xxhashlite), Yann Collet [cph] (Author of the embedded xxHash library), RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.