PWM creating, matching, and related utilities
Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)
PWM(x, type = c("log2probratio", "prob"), prior.params = c(A=0.25, C=0.25, G=0.25, T=0.25)) matchPWM(pwm, subject, min.score="80%", with.score=FALSE, ...) countPWM(pwm, subject, min.score="80%", ...) PWMscoreStartingAt(pwm, subject, starting.at=1) ## Utility functions for basic manipulation of the Position Weight Matrix maxWeights(x) minWeights(x) maxScore(x) minScore(x) unitScale(x) ## S4 method for signature 'matrix' reverseComplement(x, ...)
x |
For For |
type |
The type of Position Weight Matrix, either "log2probratio" or "prob". See Details section for more information. |
prior.params |
A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information. |
pwm |
A Position Weight Matrix represented as a numeric matrix with row names A, C, G and T. |
subject |
Typically a DNAString object. A Views object on a DNAString subject, a MaskedDNAString object, or a single character string, are also supported. IUPAC ambiguity letters in |
min.score |
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.
|
with.score |
|
starting.at |
An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject. |
... |
Additional arguments for methods. |
The PWM
function uses a multinomial model with a Dirichlet conjugate
prior to calculate the estimated probability of base b at position i. As
mentioned in the Arguments section, prior.params
supplies the
parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These
values result in a position independent initial estimate of the probabilities
for the bases to be
priorProbs = prior.params/sum(prior.params)
and the
posterior (data infused) estimate for the probabilities for the bases in each
of the positions to be
postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params))
.
When type = "log2probratio"
, the PWM = unitScale(log2(postProbs/priorProbs))
.
When type = "prob"
, the PWM = unitScale(postProbs)
.
A numeric matrix representing the Position Weight Matrix for PWM
.
A numeric vector containing the Position Weight Matrix-based scores
for PWMscoreStartingAt
.
An XStringViews object for matchPWM
.
A single integer for countPWM
.
A vector containing the max weight for each position in pwm
for maxWeights
.
A vector containing the min weight for each position in pwm
for minWeights
.
The highest possible score for a given Position Weight Matrix for
maxScore
.
The lowest possible score for a given Position Weight Matrix for
minScore
.
The modified numeric matrix given by
(x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x))
for
unitScale
.
A PWM obtained by reverting the column order in PWM x
and by
reassigning each row to its complementary nucleotide
for reverseComplement
.
H. Pagès and P. Aboyoun
Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):276-87.
## Data setup: data(HNF4alpha) library(BSgenome.Dmelanogaster.UCSC.dm3) chr3R <- Dmelanogaster$chr3R chr3R ## Create a PWM from a PFM or directly from a rectangular ## DNAStringSet object: pfm <- consensusMatrix(HNF4alpha) pwm <- PWM(pfm) # same as 'PWM(HNF4alpha)' ## Perform some general routines on the PWM: round(pwm, 2) maxWeights(pwm) maxScore(pwm) reverseComplement(pwm) ## Score the first 5 positions: PWMscoreStartingAt(pwm, chr3R, starting.at=1:5) ## Match the plus strand: hits <- matchPWM(pwm, chr3R) nhit <- countPWM(pwm, chr3R) # same as 'length(hits)' ## Use 'with.score=TRUE' to get the scores of the hits: hits <- matchPWM(pwm, chr3R, with.score=TRUE) head(mcols(hits)$score) min(mcols(hits)$score / maxScore(pwm)) # should be >= 0.8 ## The scores can also easily be post-calculated: scores <- PWMscoreStartingAt(pwm, subject(hits), start(hits)) ## Match the minus strand: matchPWM(reverseComplement(pwm), chr3R)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.