Normalize ChIP-Seq Read Counts to Input and Test for Enrichment
Normalize ChIP-Seq read counts to input control values, then test for significant enrichment relative to the control.
normalizeChIPtoInput(input, response, dispersion=0.01, niter=6, loss="p", plot=FALSE, verbose=FALSE, ...) calcNormOffsetsforChIP(input, response, dispersion=0.01, niter=6, loss="p", plot=FALSE, verbose=FALSE, ...)
input |
numeric vector of non-negative input values, not necessarily integer. |
response |
vector of non-negative integer counts of some ChIP-Seq mark for each gene or other genomic feature. |
dispersion |
negative binomial dispersion, must be positive. |
niter |
number of iterations. |
loss |
loss function to be used when fitting the response counts to the input: |
plot |
if |
verbose |
if |
... |
other arguments are passed to the |
normalizeChIPtoInput
identifies significant enrichment for a ChIP-Seq mark relative to input values.
The ChIP-Seq mark might be for example transcriptional factor binding or an epigenetic mark.
The function works on the data from one sample.
Replicate libraries are not explicitly accounted for; this function can either be run on each sample individually or on a pooled of replicates.
ChIP-Seq counts are assumed to be summarized by gene or similar genomic feature of interest.
This function makes the assumption that a non-negligible proportion of the genes, say 25% or more, are not truly marked by the ChIP-Seq feature of interest. Unmarked genes are further assumed to have counts at a background level proportional to the input. The function aligns the counts to the input so that the counts for the unmarked genes behave like a random sample. The function estimates the proportion of marked genes, and removes marked genes from the fitting process. For this purpose, marked genes are those with a Holm-adjusted mid-p-value less than 0.5.
When plot=TRUE
, the genes shown in red are the marked genes (with Holm mid-p-value < 0.5) that have been removed as probably enriched during the fitting process.
The normalization line has been fitted to the non-marked genes plotted in black.
The read counts are treated as negative binomial. The dispersion parameter is not estimated from the data; instead a reasonable value is assumed to be given.
calcNormOffsetsforChIP
returns a numeric matrix of offsets, ready for linear modelling.
normalizeChIPtoInput
returns a list with components
p.value |
numeric vector of p-values for enrichment. |
scaling.factor |
factor by which input is scaled to align with response counts for unmarked genes. |
prop.enriched |
proportion of marked genes, as internally estimated |
calcNormOffsetsforChIP
returns a numeric matrix of offsets.
Gordon Smyth
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.