G-Test for Count Data
GTest
performs chi-squared contingency table tests
and goodness-of-fit tests.
GTest(x, y = NULL, correct = c("none", "williams", "yates"), p = rep(1/length(x), length(x)))
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
correct |
one out of |
p |
a vector of probabilities of the same length of |
The G-test is also called "Likelihood Ratio Test" and is asymptotically equivalent to the Pearson ChiSquare-test but not usually used when analyzing 2x2 tables. It is used in logistic regression and loglinear modeling which involves contingency tables. The G-test is also reported in the standard summary of Desc
for tables.
If x
is a matrix with one row or column, or if x
is a
vector and y
is not given, then a goodness-of-fit test
is performed (x
is treated as a one-dimensional
contingency table). The entries of x
must be non-negative
integers. In this case, the hypothesis tested is whether the
population probabilities equal those in p
, or are all equal if
p
is not given.
If x
is a matrix with at least two rows and columns, it is
taken as a two-dimensional contingency table: the entries of x
must be non-negative integers. Otherwise, x
and y
must
be vectors or factors of the same length; cases with missing values
are removed, the objects are coerced to factors, and the contingency
table is computed from these. Then G-test is
performed on the null hypothesis that the joint distribution of the
cell counts in a 2-dimensional contingency table is the product of the
row and column marginals.
TOI Yates' correction taken from Mike Camann's 2x2 G-test function. GOF Yates' correction as described in Zar (2000)
A list with class "htest"
containing the following
components:
statistic |
the value the chi-squared test statistic. |
parameter |
the degrees of freedom of the approximate
chi-squared distribution of the test statistic, |
p.value |
the p-value for the test. |
method |
a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used. |
data.name |
a character string giving the name(s) of the data. |
observed |
the observed counts. |
expected |
the expected counts under the null hypothesis. |
Pete Hurd <phurd@ualberta.ca>
Hope, A. C. A. (1968) A simplified Monte Carlo significance test procedure. J. Roy, Statist. Soc. B 30, 582–598.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics 30, 91–97.
Agresti, A. (2007) An Introduction to Categorical Data Analysis, 2nd ed., New York: John Wiley & Sons. Page 38.
Sokal, R. R., F. J. Rohlf (2012) Biometry: the principles and practice of statistics in biological research. 4th edition. W. H. Freeman and Co.: New York. 937 pp.
## From Agresti(2007) p.39 M <- as.table(rbind(c(762, 327, 468), c(484,239,477))) dimnames(M) <- list(gender=c("M","F"), party=c("Democrat","Independent", "Republican")) (Xsq <- GTest(M)) # Prints test summary Xsq$observed # observed counts (same as M) Xsq$expected # expected counts under the null ## Testing for population probabilities ## Case A. Tabulated data x <- c(A = 20, B = 15, C = 25) GTest(x) GTest(as.table(x)) # the same x <- c(89,37,30,28,2) p <- c(40,20,20,15,5) try( GTest(x, p = p) # gives an error ) # works p <- c(0.40,0.20,0.20,0.19,0.01) # Expected count in category 5 # is 1.86 < 5 ==> chi square approx. GTest(x, p = p) # maybe doubtful, but is ok! ## Case B. Raw data x <- trunc(5 * runif(100)) GTest(table(x)) # NOT 'GTest(x)'!
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.