Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

bpaggregate

Apply a function on subsets of data frames


Description

This is a parallel version of aggregate.

Usage

## S4 method for signature 'formula,BiocParallelParam'
bpaggregate(x, data, FUN, ..., 
    BPREDO=list(), BPPARAM=bpparam())

## S4 method for signature 'data.frame,BiocParallelParam'
bpaggregate(x, by, FUN, ..., 
    simplify=TRUE, BPREDO=list(), BPPARAM=bpparam())

## S4 method for signature 'matrix,BiocParallelParam'
bpaggregate(x, by, FUN, ..., 
    simplify=TRUE, BPREDO=list(), BPPARAM=bpparam())

## S4 method for signature 'ANY,missing'
bpaggregate(x, ..., BPREDO=list(), BPPARAM=bpparam())

Arguments

x

A data.frame, matrix or a formula.

by

A list of factors by which x is split; applicable when x is data.frame or matrix.

data

A data.frame; applicable when x is a formula.

FUN

Function to apply.

...

Additional arguments for FUN.

simplify

If set to TRUE, the return values of FUN will be simplified using simplify2array.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation.

BPREDO

A list of output from bpaggregate with one or more failed elements. When a list is given in BPREDO, bpok is used to identify errors, tasks are rerun and inserted into the original results.

Details

bpaggregate is a generic with methods for data.frame matrix and formula objects. x is divided into subsets according to factors in by. Data chunks are sent to the workers, FUN is applied and results are returned as a data.frame.

The function is similar in spirit to aggregate from the stats package but aggregate is not explicitly called. The bpaggregate formula method reformulates the call and dispatches to the data.frame method which in turn distributes data chunks to workers with bplapply.

Value

See aggregate.

Author(s)

Examples

if (interactive() && require(Rsamtools) && require(GenomicAlignments)) {

  fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
  param <- ScanBamParam(what = c("flag", "mapq"))
  gal <- readGAlignments(fl, param=param) 

  ## Report the mean map quality by range cutoff:
  cutoff <- rep(0, length(gal))
  cutoff[start(gal) > 1000 & start(gal) < 1500] <- 1
  cutoff[start(gal) > 1500] <- 2 
  bpaggregate(as.data.frame(mcols(gal)$mapq), list(cutoff = cutoff), mean)

}

BiocParallel

Bioconductor facilities for parallel evaluation

v1.24.1
GPL-2 | GPL-3
Authors
Bioconductor Package Maintainer [cre], Martin Morgan [aut], Valerie Obenchain [aut], Michel Lang [aut], Ryan Thompson [aut], Nitesh Turaga [aut], Aaron Lun [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.