Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

extract

Extract a character column into multiple columns using regular expression groups


Description

Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA.

Usage

extract(
  data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

data

A data frame.

col

Column name or position. This is passed to tidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).

into

Names of new variables to create as character vector. Use NA to omit the variable in the output.

regex

a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into.

remove

If TRUE, remove input column from output data frame.

convert

If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string "NA"s to be converted to NAs.

...

Additional arguments passed on to methods.

See Also

separate() to split up by a separator.

Examples

df <- data.frame(x = c(NA, "a-b", "a-d", "b-c", "d-e"))
df %>% extract(x, "A")
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")

# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")

tidyr

Tidy Messy Data

v1.1.3
MIT + file LICENSE
Authors
Hadley Wickham [aut, cre], RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.