Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

applyPileups

Apply a user-provided function to calculate pile-up statistics across multiple BAM files.


Description

applyPileups scans one or more BAM files, returning position-specific sequence and quality summaries.

Usage

applyPileups(files, FUN, ..., param)

Arguments

files

A PileupFiles instances.

FUN

A function of 1 argument, x, to be evaluated for each yield (see yieldSize, yieldBy, yieldAll). The argument x is a list, with elements describing the current pile-up. The elements of the list are determined by the argument what, and include:

seqnames:

(Always returned) A named integer() representing the seqnames corresponding to each position reported in the pile-up. This is a run-length encoding, where the names of the elements represent the seqnames, and the values the number of successive positions corresponding to that seqname.

pos:

Always returned) A integer() representing the genomic coordinate of each pile-up position.

seq:

An array of dimensions nucleotide x file x position.

The ‘nucleotide’ dimension is length 5, corresponding to ‘A’, ‘C’, ‘G’, ‘T’, and ‘N’ respectively.

Entries in the array represent the number of times the nucleotide occurred in reads in the file overlapping the position.

qual:

Like seq, but summarizing quality; the first dimension is the Phred-encoded quality score, ranging from ‘!’ (0) to ‘~’ (93).

...

Additional arguments, passed to methods.

param

An instance of the object returned by ApplyPileupsParam.

Details

Regardless of param values, the algorithm follows samtools by excluding reads flagged as unmapped, secondary, duplicate, or failing quality control.

Value

applyPileups returns a list equal in length to the number of times FUN has been called, with each element containing the result of FUN.

ApplyPileupsParam returns an object describing the parameters.

Author(s)

Martin Morgan

References

See Also

Examples

## The examples below are currently broken and have been disabled for now
## Not run: 
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)

fls <- PileupFiles(c(fl, fl))

calcInfo <-
    function(x)
{
    ## information at each pile-up position
    info <- apply(x[["seq"]], 2, function(y) {
        y <- y[c("A", "C", "G", "T"),,drop=FALSE]
        y <- y + 1L                     # continuity
        cvg <- colSums(y)
        p <- y / cvg[col(y)]
        h <- -colSums(p * log(p))
        ifelse(cvg == 4L, NA, h)
    })
    list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info)
}
which <- GRanges(c("seq1", "seq2"), IRanges(c(1000, 1000), 2000))
param <- ApplyPileupsParam(which=which, what="seq")
res <- applyPileups(fls, calcInfo, param=param)
str(res)
head(res[[1]][["pos"]])		# positions matching param
head(res[[1]][["info"]])		# inforamtion in each file

## 'param' as part of 'files'
fls1 <- PileupFiles(c(fl, fl), param=param)
res1 <- applyPileups(fls1, calcInfo)
identical(res, res1)

## yield by position, across ranges
param <- ApplyPileupsParam(which=which, yieldSize=500L,
                           yieldBy="position", what="seq")
res <- applyPileups(fls, calcInfo, param=param)
sapply(res, "[[", "seqnames")

## End(Not run)

Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

v2.6.0
Artistic-2.0 | file LICENSE
Authors
Martin Morgan, Hervé Pagès, Valerie Obenchain, Nathaniel Hayden
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.