Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

normalizeChIPtoInput

Normalize ChIP-Seq Read Counts to Input and Test for Enrichment


Description

Normalize ChIP-Seq read counts to input control values, then test for significant enrichment relative to the control.

Usage

normalizeChIPtoInput(input, response, dispersion=0.01, niter=6, loss="p", plot=FALSE,
                     verbose=FALSE, ...)
calcNormOffsetsforChIP(input, response, dispersion=0.01, niter=6, loss="p", plot=FALSE,
                       verbose=FALSE, ...)

Arguments

input

numeric vector of non-negative input values, not necessarily integer.

response

vector of non-negative integer counts of some ChIP-Seq mark for each gene or other genomic feature.

dispersion

negative binomial dispersion, must be positive.

niter

number of iterations.

loss

loss function to be used when fitting the response counts to the input: "p" for cumulative probabilities or "z" for z-value.

plot

if TRUE, a plot of the fit is produced.

verbose

if TRUE, working estimates from each iteration are output.

...

other arguments are passed to the plot function.

Details

normalizeChIPtoInput identifies significant enrichment for a ChIP-Seq mark relative to input values. The ChIP-Seq mark might be for example transcriptional factor binding or an epigenetic mark. The function works on the data from one sample. Replicate libraries are not explicitly accounted for; this function can either be run on each sample individually or on a pooled of replicates.

ChIP-Seq counts are assumed to be summarized by gene or similar genomic feature of interest.

This function makes the assumption that a non-negligible proportion of the genes, say 25% or more, are not truly marked by the ChIP-Seq feature of interest. Unmarked genes are further assumed to have counts at a background level proportional to the input. The function aligns the counts to the input so that the counts for the unmarked genes behave like a random sample. The function estimates the proportion of marked genes, and removes marked genes from the fitting process. For this purpose, marked genes are those with a Holm-adjusted mid-p-value less than 0.5.

When plot=TRUE, the genes shown in red are the marked genes (with Holm mid-p-value < 0.5) that have been removed as probably enriched during the fitting process. The normalization line has been fitted to the non-marked genes plotted in black.

The read counts are treated as negative binomial. The dispersion parameter is not estimated from the data; instead a reasonable value is assumed to be given.

calcNormOffsetsforChIP returns a numeric matrix of offsets, ready for linear modelling.

Value

normalizeChIPtoInput returns a list with components

p.value

numeric vector of p-values for enrichment.

scaling.factor

factor by which input is scaled to align with response counts for unmarked genes.

prop.enriched

proportion of marked genes, as internally estimated

calcNormOffsetsforChIP returns a numeric matrix of offsets.

Author(s)

Gordon Smyth


edgeR

Empirical Analysis of Digital Gene Expression Data in R

v3.32.1
GPL (>=2)
Authors
Yunshun Chen, Aaron TL Lun, Davis J McCarthy, Matthew E Ritchie, Belinda Phipson, Yifang Hu, Xiaobei Zhou, Mark D Robinson, Gordon K Smyth
Initial release
2021-01-14

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.