Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

replaceOutliers

Replace outliers with trimmed mean


Description

Note that this function is called within DESeq, so is not necessary to call on top of a DESeq call. See the minReplicatesForReplace argument documented in link{DESeq}.

Usage

replaceOutliers(
  object,
  trim = 0.2,
  cooksCutoff,
  minReplicates = 7,
  whichSamples
)

replaceOutliersWithTrimmedMean(
  object,
  trim = 0.2,
  cooksCutoff,
  minReplicates = 7,
  whichSamples
)

Arguments

object

a DESeqDataSet object, which has already been processed by either DESeq, nbinomWaldTest or nbinomLRT, and therefore contains a matrix contained in assays(dds)[["cooks"]]. These are the Cook's distances which will be used to define outlier counts.

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of the normalized counts for a gene before the mean is computed

cooksCutoff

the threshold for defining an outlier to be replaced. Defaults to the .99 quantile of the F(p, m - p) distribution, where p is the number of parameters and m is the number of samples.

minReplicates

the minimum number of replicate samples necessary to consider a sample eligible for replacement (including itself). Outlier counts will not be replaced if the sample is in a cell which has less than minReplicates replicates.

whichSamples

optional, a numeric or logical index to specify which samples should have outliers replaced. if missing, this is determined using minReplicates.

Details

This function replaces outlier counts flagged by extreme Cook's distances, as calculated by DESeq, nbinomWaldTest or nbinomLRT, with values predicted by the trimmed mean over all samples (and adjusted by size factor or normalization factor). This function replaces the counts in the matrix returned by counts(dds) and the Cook's distances in assays(dds)[["cooks"]]. Original counts are preserved in assays(dds)[["originalCounts"]].

The DESeq function calculates a diagnostic measure called Cook's distance for every gene and every sample. The results function then sets the p-values to NA for genes which contain an outlying count as defined by a Cook's distance above a threshold. With many degrees of freedom, i.e. many more samples than number of parameters to be estimated– it might be undesirable to remove entire genes from the analysis just because their data include a single count outlier. An alternate strategy is to replace the outlier counts with the trimmed mean over all samples, adjusted by the size factor or normalization factor for that sample. The following simple function performs this replacement for the user, for samples which have at least minReplicates number of replicates (including that sample). For more information on Cook's distance, please see the two sections of the vignette: 'Dealing with count outliers' and 'Count outlier detection'.

Value

a DESeqDataSet with replaced counts in the slot returned by counts and the original counts preserved in assays(dds)[["originalCounts"]]

See Also


DESeq2

Differential gene expression analysis based on the negative binomial distribution

v1.30.1
LGPL (>= 3)
Authors
Michael Love [aut, cre], Constantin Ahlmann-Eltze [ctb], Kwame Forbes [ctb], Simon Anders [aut, ctb], Wolfgang Huber [aut, ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.