Filter Genes By Expression Level
Determine which genes have sufficiently large counts to be retained in a statistical analysis.
## S3 method for class 'DGEList' filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...) ## S3 method for class 'SummarizedExperiment' filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...) ## Default S3 method: filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, min.count = 10, min.total.count = 15, large.n = 10, min.prop = 0.7, ...)
y |
matrix of counts, or a |
design |
design matrix. Ignored if |
group |
vector or factor giving group membership for a oneway layout, if appropriate. |
lib.size |
library size, defaults to |
min.count |
numeric. Minimum count required for at least some samples. |
min.total.count |
numeric. Minimum total count required. |
large.n |
integer. Number of samples per group that is considered to be “large”. |
min.prop |
numeric. Minimum proportion of samples in the smallest group that express the gene. |
... |
any other arguments.
For the |
This function implements the filtering strategy that was intuitively described by Chen et al (2016).
Roughly speaking, the strategy keeps genes that have at least min.count
reads in a worthwhile number samples.
More precisely, the filtering keeps genes that have count-per-million (CPM) above k in n samples, where k is determined by min.count
and by the sample library sizes and n is determined by the design matrix.
n is essentially the smallest group sample size or, more generally, the minimum inverse leverage of any fitted value.
If all the group sizes are larger than large.n
, then this is relaxed slightly, but with n always greater than min.prop
of the smallest group size (70% by default).
In addition, each kept gene is required to have at least min.total.count
reads across all the samples.
Logical vector of length nrow(y)
indicating which rows of y
to keep in the analysis.
Gordon Smyth
Chen Y, Lun ATL, and Smyth, GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438. http://f1000research.com/articles/5-1438
## Not run: keep <- filterByExpr(y, design) y <- y[keep,] ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.