Split data.table into chunks in a list
Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using by
argument, read more on data.table
.
## S3 method for class 'data.table' split(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, ..., verbose = getOption("datatable.verbose"))
x |
data.table |
f |
factor or list of factors. Same as |
drop |
logical. Default |
by |
character vector. Column names on which split should be made. For |
sorted |
When default |
keep.by |
logical default |
flatten |
logical default |
... |
passed to data.frame way of processing when using |
verbose |
logical default |
Argument f
is just for consistency in usage to data.frame method. Recommended is to use by
argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.
List of data.table
s. If using flatten
FALSE and length(by) > 1L
then recursively nested lists having data.table
s as leafs of grouping according to by
argument.
set.seed(123) DT = data.table(x1 = rep(letters[1:2], 6), x2 = rep(letters[3:5], 4), x3 = rep(letters[5:8], 3), y = rnorm(12)) DT = DT[sample(.N)] DF = as.data.frame(DT) # split consistency with data.frame: `x, f, drop` all.equal( split(DT, list(DT$x1, DT$x2)), lapply(split(DF, list(DF$x1, DF$x2)), setDT) ) # nested list using `flatten` arguments split(DT, by=c("x1", "x2")) split(DT, by=c("x1", "x2"), flatten=FALSE) # dealing with factors fdt = DT[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3] fdf = as.data.frame(fdt) sdf = split(fdf, list(fdf$x1, fdf$x2)) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE), lapply(sdf[sort(names(sdf))], setDT) ) # factors having unused levels, drop FALSE, TRUE fdt = DT[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L], x2 = as.factor(c("a", as.character(x2)))[-1L], x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)], y = y)] fdf = as.data.frame(fdt) sdf = split(fdf, list(fdf$x1, fdf$x2)) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE), lapply(sdf[sort(names(sdf))], setDT) ) sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT) )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.