Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

BSgenome-utils

BSgenome utilities


Description

Utilities for BSgenome objects.

Usage

## S4 method for signature 'BSgenome'
vmatchPattern(pattern, subject, max.mismatch=0, min.mismatch=0,
              with.indels=FALSE, fixed=TRUE, algorithm="auto",
              exclude="", maskList=logical(0), userMask=IRangesList(),
              invertUserMask=FALSE)
## S4 method for signature 'BSgenome'
vcountPattern(pattern, subject, max.mismatch=0, min.mismatch=0,
              with.indels=FALSE, fixed=TRUE, algorithm="auto",
              exclude="", maskList=logical(0), userMask=IRangesList(),
              invertUserMask=FALSE)

## S4 method for signature 'BSgenome'
vmatchPDict(pdict, subject, max.mismatch=0, min.mismatch=0,
            fixed=TRUE, algorithm="auto", verbose=FALSE,
            exclude="", maskList=logical(0))
## S4 method for signature 'BSgenome'
vcountPDict(pdict, subject, max.mismatch=0, min.mismatch=0,
            fixed=TRUE, algorithm="auto", collapse=FALSE,
            weight=1L, verbose=FALSE, exclude="", maskList=logical(0))

## S4 method for signature 'BSgenome'
matchPWM(pwm, subject, min.score="80%", exclude="", maskList=logical(0))
## S4 method for signature 'BSgenome'
countPWM(pwm, subject, min.score="80%", exclude="", maskList=logical(0))

Arguments

pattern

A DNAString object containing the pattern sequence.

subject

A BSgenome object containing the subject sequences.

max.mismatch, min.mismatch

The maximum and minimum number of mismatching letters allowed (see ?`lowlevel-matching` for the details). If non-zero, an inexact matching algorithm is used.

with.indels

If TRUE then indels are allowed. In that case, min.mismatch must be 0 and max.mismatch is interpreted as the maximum "edit distance" allowed between any pattern and any of its matches (see ?`matchPattern` for the details).

fixed

If FALSE then IUPAC extended letters are interpreted as ambiguities (see ?`lowlevel-matching` for the details).

algorithm

For vmatchPattern and vcountPattern one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", "shift-or", or "indels".

For vmatchPDict and vcountPDict one of the following: "auto", "naive-exact", "naive-inexact", "boyer-moore", or "shift-or".

exclude

A character vector with strings that will be used to filter out chromosomes whose names match these strings.

maskList

A named logical vector of maskStates preferred when used with a BSGenome object. When using the bsapply function, the masks will be set to the states in this vector.

userMask

An IntegerRangesList, containing a mask to be applied to each chromosome. See bsapply.

invertUserMask

Whether the userMask should be inverted.

collapse, weight

ignored arguments.

pdict

A PDict or DNAStringSet object containing the pattern sequences.

verbose

TRUE or FALSE.

pwm

A numeric matrix with row names A, C, G and T representing a Position Weight Matrix.

min.score

The minimum score for counting a match. Can be given as a character string containing a percentage (e.g. "85%") of the highest possible score or as a single number.

Value

A GRanges object for vmatchPattern.

A data.frame object for vcountPattern and countPWM with three columns: "seqname" (factor), "strand" (factor), and "count" (integer).

A GRanges object for vmatchPDict with one metadata column: "index", which represents a mapping to a position in the original pattern dictionary.

A DataFrame object for vcountPDict with four columns: "seqname" ('factor' Rle), "strand" ('factor' Rle), "index" (integer) and "count" ('integer' Rle). As with vmatchPDict the index column represents a mapping to a position in the original pattern dictionary.

A GRanges object for matchPWM with two metadata columns: "score" (numeric), and "string" (DNAStringSet).

Author(s)

P. Aboyoun

See Also

Examples

library(BSgenome.Celegans.UCSC.ce2)
data(HNF4alpha)

pattern <- consensusString(HNF4alpha)
vmatchPattern(pattern, Celegans, fixed="subject")
vcountPattern(pattern, Celegans, fixed="subject")

pdict <- PDict(HNF4alpha)
vmatchPDict(pdict, Celegans)
vcountPDict(pdict, Celegans)

pwm <- PWM(HNF4alpha)
matchPWM(pwm, Celegans)
countPWM(pwm, Celegans)

BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs

v1.58.0
Artistic-2.0
Authors
Hervé Pagès
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.