Summarise each group to fewer rows
summarise()
creates a new data frame. It will have one (or more) rows for
each combination of grouping variables; if there are no grouping variables,
the output will have a single row summarising all observations in the input.
It will contain one column for each grouping variable and one column
for each of the summary statistics that you have specified.
summarise()
and summarize()
are synonyms.
summarise(.data, ..., .groups = NULL) summarize(.data, ..., .groups = NULL)
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< The value can be:
|
.groups |
Grouping structure of the result.
When
In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to |
An object usually of the same type as .data
.
The rows come from the underlying group_keys()
.
The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the .groups=
argument, the
output may be another grouped_df, a tibble or a rowwise data frame.
Data frame attributes are not preserved, because summarise()
fundamentally creates a new data frame.
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in mutate()
.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
# A summary applied to ungrouped tbl returns a single row mtcars %>% summarise(mean = mean(disp), n = n()) # Usually, you'll want to group first mtcars %>% group_by(cyl) %>% summarise(mean = mean(disp), n = n()) # dplyr 1.0.0 allows to summarise to more than one value: mtcars %>% group_by(cyl) %>% summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75)) # You use a data frame to create multiple columns so you can wrap # this up into a function: my_quantile <- function(x, probs) { tibble(x = quantile(x, probs), probs = probs) } mtcars %>% group_by(cyl) %>% summarise(my_quantile(disp, c(0.25, 0.75))) # Each summary call removes one grouping level (since that group # is now just a single row) mtcars %>% group_by(cyl, vs) %>% summarise(cyl_n = n()) %>% group_vars() # BEWARE: reusing variables may lead to unexpected results mtcars %>% group_by(cyl) %>% summarise(disp = mean(disp), sd = sd(disp)) # Refer to column names stored as strings with the `.data` pronoun: var <- "mass" summarise(starwars, avg = mean(.data[[var]], na.rm = TRUE)) # Learn more in ?dplyr_data_masking
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.