genefilter: rowpAUCs – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

rowpAUCs

Rowwise ROC and pAUC computation

Description

Methods for fast rowwise computation of ROC curves and (partial) area under the curve (pAUC) using the simple classification rule x > theta, where theta is a value in the range of x

Usage

rowpAUCs(x, fac, p=0.1, flip=TRUE, caseNames=c("1", "2"))

Arguments

`x`	`ExpressionSet` or numeric `matrix`. The `matrix` must not contain `NA` values.
`fac`	A `factor` or `numeric` or `character` that can be coerced to a `factor`. If `x` is an `ExpressionSet`, this may also be a character `vector` of length 1 with the name of a covariate variable in `x`. `fac` must have exactly 2 levels. For better control over the classification, use integer values in 0 and 1, where 1 indicates the "Disease" class in the sense of the Pepe et al paper (see below).
`p`	Numeric `vector` of length 1. Limit in (0,1) to integrate pAUC to.
`flip`	Logical. If `TRUE`, both classification rules `x > theta` and `x < theta` are tested and the (partial) area under the curve of the better one of the two is returned. This is appropriate for the cases in which the classification is not necessarily linked to higher expression values, but instead it is symmetric and one would assume both over- and under-expressed genes for both classes. You can set `flip` to `FALSE` if you only want to screen for genes which discriminate Disease from Control with the `x > theta` rule.
`caseNames`	The class names that are used when plotting the data. If `fac` is the name of the covariate variable in the `ExpressionSet` the function will use its levels as `caseNames`.

Details

Rowwise calculation of Receiver Operating Characteristic (ROC) curves and the corresponding partial area under the curve (pAUC) for a given data matrix or ExpressionSet. The function is implemented in C and thus reasonably fast and memory efficient. Cutpoints (theta are calculated before the first, in between and after the last data value. By default, both classification rules x > theta and x < theta are tested and the (partial) area under the curve of the better one of the two is returned. This is only valid for symmetric cases, where the classification is independent of the magnitude of x (e.g., both over- and under-expression of different genes in the same class). For unsymmetric cases in which you expect x to be consistently higher/lower in of of the two classes (e.g. presence or absence of a single biomarker) set flip=FALSE or use the functionality provided in the ROC package. For better control over the classification (i.e., the choice of "Disease" and "Control" class in the sense of the Pepe et al paper), argument fac can be an integer in [0,1] where 1 indicates "Disease" and 0 indicates "Control".

Value

An object of class rowROC with the calculated specificities and sensitivities for each row and the corresponding pAUCs and AUCs values. See rowROC for details.

Methods

Methods exist for rowPAUCs:

rowPAUCs: signature(x="matrix", fac="factor")
rowPAUCs: signature(x="matrix", fac="numeric")
rowPAUCs: signature(x="ExpressionSet")
rowPAUCs: signature(x="ExpressionSet", fac="character")

Author(s)

Florian Hahne <fhahne@fhcrc.org>

References

Pepe MS, Longton G, Anderson GL, Schummer M.: Selecting differentially expressed genes from microarray experiments. Biometrics. 2003 Mar;59(1):133-42.

Examples

library(Biobase)
data(sample.ExpressionSet)

r1 = rowttests(sample.ExpressionSet, "sex")
r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1)

plot(area(r2, total=TRUE), r1$statistic, pch=16)
sel <- which(area(r2, total=TRUE) > 0.7)
plot(r2[sel])

## this compares performance and output of rowpAUCs to function pAUC in
## package ROC 
if(require(ROC)){
  ## performance
  myRule = function(x)
    pAUC(rocdemo.sca(truth = as.integer(sample.ExpressionSet$sex)-1 ,
         data = x, rule = dxrule.sca), t0 = 0.1)
  nGenes = 200
  cat("computation time for ", nGenes, "genes:\n")
  cat("function pAUC: ")
  print(system.time(r3 <- esApply(sample.ExpressionSet[1:nGenes, ], 1, myRule)))
  cat("function rowpAUCs: ")
  print(system.time(r2 <- rowpAUCs(sample.ExpressionSet[1:nGenes, ],
  "sex", p=1)))

  ## compare output
  myRule2 = function(x)
   pAUC(rocdemo.sca(truth = as.integer(sample.ExpressionSet$sex)-1 ,
                    data = x, rule = dxrule.sca), t0 = 1)
  r4 <-  esApply(sample.ExpressionSet[1:nGenes, ], 1, myRule2)
  plot(r4,area(r2), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")

  plot(r4, area(rowpAUCs(sample.ExpressionSet[1:nGenes, ],
  "sex", p=1, flip=FALSE)), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")

  r4[r4<0.5] <- 1-r4[r4<0.5]
  plot(r4, area(r2), xlab="function pAUC", ylab="function rowpAUCs",
  main="pAUCs")
 }

genefilter

genefilter: methods for filtering genes from high-throughput experiments

v1.72.1

Artistic-2.0

Authors

R. Gentleman, V. Carey, W. Huber, F. Hahne

Initial release