Split data frame, apply function, and return results in a data frame.
For each subset of a data frame, apply function then combine results into a
data frame.
To apply a function for each row, use adply
with
.margins
set to 1
.
ddply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL )
.data |
data frame to be processed |
.variables |
variables to split data frame by, as |
.fun |
function to apply to each piece |
... |
other arguments passed on to |
.progress |
name of the progress bar to use, see
|
.inform |
produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging |
.drop |
should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default) |
.parallel |
if |
.paropts |
a list of additional options passed into
the |
A data frame, as described in the output section.
This function splits data frames by variables.
The most unambiguous behaviour is achieved when .fun
returns a
data frame - in that case pieces will be combined with
rbind.fill
. If .fun
returns an atomic vector of
fixed length, it will be rbind
ed together and converted to a data
frame. Any other values will result in an error.
If there are no results, then this function will return a data
frame with zero rows and columns (data.frame()
).
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.
tapply
for similar functionality in the base package
# Summarize a dataset by two variables dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) ) # Note the use of the '.' function to allow # group and sex to be used without quoting ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2)) # An example using a formula for .variables ddply(baseball[1:100,], ~ year, nrow) # Applying two functions; nrow and ncol ddply(baseball, .(lg), c("nrow", "ncol")) # Calculate mean runs batted in for each year rbi <- ddply(baseball, .(year), summarise, mean_rbi = mean(rbi, na.rm = TRUE)) # Plot a line chart of the result plot(mean_rbi ~ year, type = "l", data = rbi) # make new variable career_year based on the # start year for each player (id) base2 <- ddply(baseball, .(id), mutate, career_year = year - min(year) + 1 )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.