Kappa statistic
Computes the kappa statistic and its confidence interval.
epi.kappa(dat, method = "fleiss", alternative = c("two.sided", "less", "greater"), conf.level = 0.95)
dat |
an object of class table comprised of 2 rows and 2 columns listing the individual cell frequencies. |
method |
a character string indicating the method to use. Options are |
alternative |
a character string specifying the alternative hypothesis, must be one of |
conf.level |
magnitude of the returned confidence interval. Must be a single number between 0 and 1. |
Kappa is a measure of agreement beyond the level of agreement expected by chance alone. The observed agreement is the proportion of samples for which both methods (or observers) agree.
The bias and prevalence adjusted kappa (Byrt et al. 1993) provides a measure of observed agreement, an index of the bias between observers, and an index of the differences between the overall proportion of ‘yes’ and ‘no’ assessments.
Common interpretations for the kappa statistic are as follows: < 0.2 slight agreement, 0.2 - 0.4 fair agreement, 0.4 - 0.6 moderate agreement, 0.6 - 0.8 substantial agreement, > 0.8 almost perfect agreement.
The argument alternative = "greater"
tests the hypothesis that kappa is greater than 0.
A list containing the following:
prop.agree |
a data frame with |
pindex |
a data frame with the prevalence index, the standard error of the prevalence index and the lower and upper bounds of the confidence interval for the prevalence index. |
bindex |
a data frame with the bias index, the standard error of the bias index and the lower and upper bounds of the confidence interval for the bias index. |
pabak |
a data frame with the prevalence and bias corrected kappa statistic and the lower and upper bounds of the confidence interval for the prevalence and bias corrected kappa statistic. |
kappa |
a data frame with the kappa statistic, the standard error of the kappa statistic and the lower and upper bounds of the confidence interval for the kappa statistic. |
z |
a data frame containing the z test statistic for kappa and its associated P-value. |
mcnemar |
a data frame containing the McNemar test statistic for kappa and its associated P-value. |
--------------- | --------------- | --------------- | ------------------ |
Obs1 + | Obs1 - | Total | |
--------------- | --------------- | --------------- | ------------------ |
Obs 2 + | a |
b |
a+b |
Obs 2 - | c |
d |
c+d |
--------------- | --------------- | --------------- | ------------------ |
Total | a+c |
b+d |
a+b+c+d=N |
--------------- | --------------- | --------------- | ------------------ |
The kappa coefficient is influenced by the prevalence of the condition being assessed. A prevalence effect exists when the proportion of agreements on the positive classification differs from that of the negative classification. If the prevalence index is high (that is, the prevalence of a positive rating is very high or very low) chance agreement is also high and the value of kappa is reduced accordingly. The effect of prevalence on kappa is greater for large values of kappa than for small values (Byrt et al. 1993). Using the notation above, the prevalence index is calculated as ((a/N) - (d/N))
. Confidence intervals for the prevalence index are based on methods used for a difference in two proportions. See Rothman (2002, p 135 equation 7-2) for details.
Bias is the extent to which raters disagree on the proportion of positive (or negative) cases. Bias affects interpretation of the kappa coefficient. When there is a large amount of bias, kappa is higher than when bias is low or absent. In contrast to prevalence, the effect of bias is greater when kappa is small than when it is large (Byrt et al. 1993). Using the notation above, the bias index is calculated as ((a + b)/N - (a + c)/N)
. Confidence intervals for the bias index are based on methods used for a difference in two proportions. See Rothman (2002, p 135 equation 7-2) for details.
The McNemar test is used to test for the presence of bias. A statistically significant McNemar test (generally if P < 0.05) shows that there is evidence of a systematic difference between the proportion of ‘positive’ responses from the two methods. If one method provides the ‘true values’ (i.e. it is regarded as the gold standard method) the absence of a systematic difference implies that there is no bias. However, a non-significant result indicates only that there is no evidence of a systematic effect. A systematic effect may be present, but the power of the test may be inadequate to determine its presence.
Altman DG, Machin D, Bryant TN, Gardner MJ (2000). Statistics with Confidence, second edition. British Medical Journal, London, pp. 116 - 118.
Byrt T, Bishop J, Carlin JB (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology 46: 423 - 429.
Dohoo I, Martin W, Stryhn H (2010). Veterinary Epidemiologic Research, second edition. AVC Inc, Charlottetown, Prince Edward Island, Canada, pp. 98 - 99.
Fleiss JL, Levin B, Paik MC (2003). Statistical Methods for Rates and Proportions, third edition. John Wiley & Sons, London, 598 - 626.
Rothman KJ (2002). Epidemiology An Introduction. Oxford University Press, London, pp. 130 - 143.
Silva E, Sterry RA, Kolb D, Mathialagan N, McGrath MF, Ballam JM, Fricke PM (2007) Accuracy of a pregnancy-associated glycoprotein ELISA to determine pregnancy status of lactating dairy cows twenty-seven days after timed artificial insemination. Journal of Dairy Science 90: 4612 - 4622.
Sim J, Wright CC (2005) The kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Physical Therapy 85: 257 - 268.
Watson PF, Petrie A (2010) Method agreement analysis: A review of correct methodology. Theriogenology 73: 1167 - 1179.
## EXAMPLE 1: ## Kidney samples from 291 salmon were split with one half of the ## samples sent to each of two laboratories where an IFAT test ## was run on each sample. The following results were obtained: ## Lab 1 positive, lab 2 positive: 19 ## Lab 1 positive, lab 2 negative: 10 ## Lab 1 negative, lab 2 positive: 6 ## Lab 1 negative, lab 2 negative: 256 dat <- as.table(matrix(c(19,10,6,256), nrow = 2, byrow = TRUE)) colnames(dat) <- c("L1-pos","L1-neg") rownames(dat) <- c("L2-pos","L2-neg") epi.kappa(dat, method = "fleiss", alternative = "greater", conf.level = 0.95) ## The z test statistic is 11.53 (P < 0.01). We accept the alternative ## hypothesis that the kappa statistic is greater than zero. ## The proportion of agreements after chance has been excluded is ## 0.67 (95% CI 0.56 to 0.79). We conclude that, on the basis of ## this sample, that there is substantial agreement between the two ## laboratories. ## EXAMPLE 2 (from Watson and Petrie 2010, page 1170): ## Silva et al. (2007) compared an early pregnancy enzyme-linked immunosorbent ## assay test for pregnancy associated glycoprotein on blood samples collected ## from lactating dairy cows at day 27 after artificial insemination with ## transrectal ultrasound (US) diagnosis of pregnancy at the same stage. ## The results were as follows: ## ELISA positive, US positive: 596 ## ELISA positive, US negative: 61 ## ELISA negative, US positive: 29 ## ELISA negative, Ul negative: 987 dat <- as.table(matrix(c(596,61,29,987), nrow = 2, byrow = TRUE)) colnames(dat) <- c("US-pos","US-neg") rownames(dat) <- c("ELISA-pos","ELISA-neg") epi.kappa(dat, method = "watson", alternative = "greater", conf.level = 0.95) ## The proportion of agreements after chance has been excluded is ## 0.89 (95% CI 0.86 to 0.91). We conclude that that there is substantial ## agreement between the two pregnancy diagnostic methods.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.