readr: tokenize – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

tokenize

Tokenize a file/string.

Description

Turns input into a character vector. Usually the tokenization is done purely in C++, and never exposed to R (because that requires a copy). This function is useful for testing, or when a file doesn't parse correctly and you want to see the underlying tokens.

Usage

tokenize(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L)

Arguments

`file`	Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in `.gz`, `.bz2`, `.xz`, or `.zip` will be automatically uncompressed. Files starting with `http://`, `https://`, `ftp://`, or `ftps://` will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed. Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a vector of greater than length 1. Using a value of `clipboard()` will read from the system clipboard.
`tokenizer`	A tokenizer specification.
`skip`	Number of lines to skip before reading data.
`n_max`	Optionally, maximum number of rows to tokenize.

Examples

tokenize("1,2\n3,4,5\n\n6")

# Only tokenize first two lines
tokenize("1,2\n3,4,5\n\n6", n = 2)

readr

Read Rectangular Text Data

v1.4.0

GPL (>= 2) | file LICENSE

Authors

Hadley Wickham [aut], Jim Hester [aut, cre], Romain Francois [ctb], R Core Team [ctb] (Date time code adapted from R), RStudio [cph, fnd], Jukka Jylänki [ctb, cph] (grisu3 implementation), Mikkel Jørgensen [ctb, cph] (grisu3 implementation)

Initial release