Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dinucleotideFrequencyTest

Pearson's chi-squared Test and G-tests for String Position Dependence


Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

Usage

dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"),
                          simulate.p.value = FALSE, B = 2000)

Arguments

x

A DNAStringSet or RNAStringSet object.

i, j

Single integer values for positions to test for dependence.

test

One of "chisq" (Person's chi-squared test), "G" (G-test), or "adjG" (William's corrected G-test). See Details section.

simulate.p.value

a logical indicating whether to compute p-values by Monte Carlo simulation.

B

an integer specifying the number of replicates used in the Monte Carlo test.

Details

The null and alternative hypotheses for this function are:

H0:

positions i and j are independent

H1:

otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions i and j respectively. Then the test statistics are calculated as:

test="chisq":

stat = sum(abs(O - E)^2/E)

test="G":

stat = 2 * sum(O * log(O/E))

test="adjG":

stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

Value

An htest object. See help(chisq.test) for more details.

Author(s)

P. Aboyoun

References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

See Also

Examples

data(HNF4alpha)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "G")
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "adjG")

Biostrings

Efficient manipulation of biological strings

v2.58.0
Artistic-2.0
Authors
H. Pagès, P. Aboyoun, R. Gentleman, and S. DebRoy
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.