Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

faq-external-vector

FAQ - Note: Using an external vector in selections is ambiguous


Description

Ambiguity between columns and external variables

With selecting functions like dplyr::select() or tidyr::pivot_longer(), you can refer to variables by name:

mtcars %>% select(cyl, am, vs)
#> # A tibble: 32 x 3
#>     cyl    am    vs
#>   <dbl> <dbl> <dbl>
#> 1     6     1     0
#> 2     6     1     0
#> 3     4     1     1
#> 4     6     0     1
#> # ... with 28 more rows

mtcars %>% select(mpg:disp)
#> # A tibble: 32 x 3
#>     mpg   cyl  disp
#>   <dbl> <dbl> <dbl>
#> 1  21       6   160
#> 2  21       6   160
#> 3  22.8     4   108
#> 4  21.4     6   258
#> # ... with 28 more rows

For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a note informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.

vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)
#> Note: Using an external vector in selections is ambiguous.
#> i Use `all_of(vars)` instead of `vars` to silence this message.
#> i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.

This note will become a warning in the future, and then an error. We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.

some_df <- mtcars[1:4, ]
some_df$vars <- 1:nrow(some_df)

These are very different objects but it isn’t a problem if the context forces you to be specific about where to find vars:

vars
#> [1] "cyl" "am"  "vs"

some_df$vars
#> [1] 1 2 3 4

In a selection context however, the column wins:

some_df %>% select(vars)
#> # A tibble: 4 x 1
#>    vars
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4     4

Fixing the ambiguity

To make your selection code more robust and silence the message, use all_of() to force the external vector:

some_df %>% select(all_of(vars))
#> # A tibble: 4 x 3
#>     cyl    am    vs
#>   <dbl> <dbl> <dbl>
#> 1     6     1     0
#> 2     6     1     0
#> 3     4     1     1
#> 4     6     0     1

For more information or if you have comments about this, please see the Github issue tracking the deprecation process.


tidyselect

Select from a Set of Strings

v1.1.1
MIT + file LICENSE
Authors
Lionel Henry [aut, cre], Hadley Wickham [aut], RStudio [cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.