tidyr: hoist – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

hoist

Rectangle a nested list into a tidy tibble

Description

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns. hoist() allows you to selectively pull components of a list-column out in to their own top-level columns, using the same syntax as purrr::pluck(). unnest_wider() turns each element of a list-column into a column, and unnest_longer() turns each element of a list-column into a row. unnest_auto() picks between unnest_wider() or unnest_longer() based heuristics described below.

Learn more in vignette("rectangle").

Usage

hoist(
  .data,
  .col,
  ...,
  .remove = TRUE,
  .simplify = TRUE,
  .ptype = list(),
  .transform = list()
)

unnest_longer(
  data,
  col,
  values_to = NULL,
  indices_to = NULL,
  indices_include = NULL,
  names_repair = "check_unique",
  simplify = TRUE,
  ptype = list(),
  transform = list()
)

unnest_wider(
  data,
  col,
  names_sep = NULL,
  simplify = TRUE,
  names_repair = "check_unique",
  ptype = list(),
  transform = list()
)

unnest_auto(data, col)

Arguments

`.data, data`	A data frame.
`.col, col`	List-column to extract components from.
`...`	Components of `.col` to turn into columns in the form `col_name = "pluck_specification"`. You can pluck by name with a character vector, by position with an integer vector, or with a combination of the two with a list. See `purrr::pluck()` for details. The column names must be unique in a call to `hoist()`, although existing columns with the same name will be overwritten. When plucking with a single string you can choose to omit the name, i.e. `hoist(df, col, "x")` is short-hand for `hoist(df, col, x = "x")`.
`.remove`	If `TRUE`, the default, will remove extracted components from `.col`. This ensures that each value lives only in one place.
`.simplify, simplify`	If `TRUE`, will attempt to simplify lists of length-1 vectors to an atomic vector
`.ptype, ptype`	Optionally, a named list of prototypes declaring the desired output type of each component. Use this argument if you want to check each element has the types you expect when simplifying.
`.transform, transform`	Optionally, a named list of transformation functions applied to each component. Use this function if you want transform or parse individual elements as they are hoisted.
`values_to`	Name of column to store vector values. Defaults to `col`.
`indices_to`	A string giving the name of column which will contain the inner names or position (if not named) of the values. Defaults to `col` with `_id` suffix
`indices_include`	Add an index column? Defaults to `TRUE` when `col` has inner names.
`names_repair`	Used to check that output data frame has valid names. Must be one of the following options: "minimal": no name repair or checks, beyond basic existence, "unique": make sure names are unique and not empty, "check_unique": (the default), no name repair, but check they are unique, "universal": make the names unique and syntactic a function: apply custom name repair. tidyr_legacy: use the name repair from tidyr 0.8. a formula: a purrr-style anonymous function (see `rlang::as_function()`) See `vctrs::vec_as_names()` for more details on these terms and the strategies used to enforce them.
`names_sep`	If `NULL`, the default, the names will be left as is. If a string, the inner and outer names will be paste together using `names_sep` as a separator.

Unnest variants

The three unnest() functions differ in how they change the shape of the output data frame:

unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frame, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

`unnest_auto()` heuristics

unnest_auto() inspects the inner names of the list-col:

If all elements are unnamed, it uses unnest_longer()
If all elements are named, and there's at least one name in common acros all components, it uses unnest_wider()
Otherwise, it falls back to unnest_longer(indices_include = TRUE).

Examples

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black",
      films = c(
        "How to Train Your Dragon",
        "How to Train Your Dragon 2",
        "How to Train Your Dragon: The Hidden World"
       )
    ),
    list(
      species = "blue tang",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df

# Turn all components of metadata into columns
df %>% unnest_wider(metadata)

# Extract only specified components
df %>% hoist(metadata,
  "species",
  first_film = list("films", 1L),
  third_film = list("films", 3L)
)

df %>%
  unnest_wider(metadata) %>%
  unnest_longer(films)

# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
  x = 1:3,
  y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
# Automatically creates names if widening
df %>% unnest_wider(y)
# But you'll usually want to provide names_sep:
df %>% unnest_wider(y, names_sep = "_")

# And similarly if the vectors are named
df <- tibble(
  x = 1:2,
  y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
df %>% unnest_longer(y)

tidyr

Tidy Messy Data

v1.1.3

MIT + file LICENSE

Authors

Hadley Wickham [aut, cre], RStudio [cph]

Initial release