Subset distinct/unique rows
Select only unique/distinct rows from a data frame. This is similar
to unique.data.frame()
but considerably faster.
distinct(.data, ..., .keep_all = FALSE)
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
.keep_all |
If |
An object of the same type as .data
. The output has the following
properties:
Rows are a subset of the input but appear in the same order.
Columns are not modified if ...
is empty or .keep_all
is TRUE
.
Otherwise, distinct()
first calls mutate()
to create new columns.
Groups are not modified.
Data frame attributes are preserved.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
df <- tibble( x = sample(10, 100, rep = TRUE), y = sample(10, 100, rep = TRUE) ) nrow(df) nrow(distinct(df)) nrow(distinct(df, x, y)) distinct(df, x) distinct(df, y) # You can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE) distinct(df, y, .keep_all = TRUE) # You can also use distinct on computed variables distinct(df, diff = abs(x - y)) # use across() to access select()-style semantics distinct(starwars, across(contains("color"))) # Grouping ------------------------------------------------- # The same behaviour applies for grouped data frames, # except that the grouping variables are always included df <- tibble( g = c(1, 1, 2, 2), x = c(1, 1, 2, 1) ) %>% group_by(g) df %>% distinct(x)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.