Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

binomTest

Exact Binomial Tests for Comparing Two Digital Libraries


Description

Computes p-values for differential abundance for each gene between two digital libraries, conditioning on the total count for each gene. The counts in each group as a proportion of the whole are assumed to follow a binomial distribution.

Usage

binomTest(y1, y2, n1=sum(y1), n2=sum(y2), p=n1/(n1+n2))

Arguments

y1

integer vector giving the count for each gene in the first library. Non-integer values are rounded to the nearest integer.

y2

integer vector giving the count for each gene in the second library. Of same length as y1. Non-integer values are rounded to the nearest integer.

n1

total number of counts in the first library, across all genes. Non-integer values are rounded to the nearest integer. Not required if p is supplied.

n2

total number of counts in the second library, across all genes. Non-integer values are rounded to the nearest integer. Not required if p is supplied.

p

expected proportion of y1 to the total for each gene under the null hypothesis.

Details

This function can be used to compare two libraries from SAGE, RNA-Seq, ChIP-Seq or other sequencing technologies with respect to technical variation.

An exact two-sided binomial test is computed for each gene. This test is closely related to Fisher's exact test for 2x2 contingency tables but, unlike Fisher's test, it conditions on the total number of counts for each gene. The null hypothesis is that the expected counts are in the same proportions as the library sizes, i.e., that the binomial probability for the first library is n1/(n1+n2).

The two-sided rejection region is chosen analogously to Fisher's test. Specifically, the rejection region consists of those values with smallest probabilities under the null hypothesis.

When the counts are reasonably large, the binomial test, Fisher's test and Pearson's chisquare all give the same results. When the counts are smaller, the binomial test is usually to be preferred in this context.

This function replaces the earlier sage.test functions in the statmod and sagenhaft packages. It produces the same results as binom.test in the stats packge, but is much faster.

Value

Numeric vector of p-values.

Author(s)

Gordon Smyth

References

http://en.wikipedia.org/wiki/RNA-Seq

See Also

sage.test (statmod package), binom.test (stats package)

Examples

binomTest(c(0,5,10),c(0,30,50),n1=10000,n2=15000)
#  Univariate equivalents:
binom.test(5,5+30,p=10000/(10000+15000))$p.value
binom.test(10,10+50,p=10000/(10000+15000))$p.value

edgeR

Empirical Analysis of Digital Gene Expression Data in R

v3.32.1
GPL (>=2)
Authors
Yunshun Chen, Aaron TL Lun, Davis J McCarthy, Matthew E Ritchie, Belinda Phipson, Yifang Hu, Xiaobei Zhou, Mark D Robinson, Gordon K Smyth
Initial release
2021-01-14

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.