Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

impute.MinProb

Imputation of left-censored missing data using stochastic minimal value approach.


Description

Performs the imputation of left-censored missing data by random draws from a Gaussian distribution centered in a minimal value. Considering a peptide/protein expression data matrix with n columns corresponding to biological samples and p lines corresponding to peptides/proteins, for each sample (column), the mean value of the Gaussian distribution is set to a minimal value observed in that sample. The minimal value observed is estimated as being the q-th quantile (e.g. q = 0.01) of the observed values in that sample. The standard deviation is estimated as the median of the peptide/protein-wise standard deviations. Note that when estimating the standard deviation of the Gaussian distribution, only the peptides/proteins which present more than 50% recorded values are considered.

Usage

impute.MinProb(dataSet.mvs, q = 0.01, tune.sigma = 1)

Arguments

dataSet.mvs

A data matrix containing left-censored missing data.

q

A scalar used to determine a low expression value to be used for missing data imputation. 0 < q < 1, in this case q should be set to a low value. The default value is q = 0.01.

tune.sigma

A scalar used to control the standard deviation of the Gaussian distribution used for random draws. If the sd is overestimated, than 0 < sigma.coef < 1. The default value is tune.sigma = 1.

Value

A complete expression data matrix with missing values imputed.

Author(s)

Cosmin Lazar

See Also

Examples

# generate expression data matrix
exprsDataObj = generate.ExpressionData(nSamples1 = 6, nSamples2 = 6,
                          meanSamples = 0, sdSamples = 0.2,
                          nFeatures = 1000, nFeaturesUp = 50, nFeaturesDown = 50,
                          meanDynRange = 20, sdDynRange = 1,
                          meanDiffAbund = 1, sdDiffAbund = 0.2)
exprsData = exprsDataObj[[1]]
  
# insert 15% missing data with 100% missing not at random
m.THR = quantile(exprsData, probs = 0.15)
sd.THR = 0.1
MNAR.rate = 50
exprsData.MD.obj = insertMVs(exprsData,m.THR,sd.THR,MNAR.rate)
exprsData.MD = exprsData.MD.obj[[2]]

# perform missing data imputation
exprsData.imputed = impute.MinProb(exprsData.MD,0.01,1)

## Not run: 
hist(exprsData[,1])
hist(exprsData.MD[,1])
hist(exprsData.imputed[,1])

## End(Not run)

## The function is currently defined as
function (dataSet.mvs, q = 0.01, tune.sigma = 1) 
{
    nSamples = dim(dataSet.mvs)[2]
    nFeatures = dim(dataSet.mvs)[1]
    dataSet.imputed = dataSet.mvs
    min.samples = apply(dataSet.imputed, 2, quantile, prob = q, 
        na.rm = T)
    count.NAs = apply(!is.na(dataSet.mvs), 1, sum)
    count.NAs = count.NAs/nSamples
    dataSet.filtered = dataSet.mvs[which(count.NAs > 0.5), ]
    protSD = apply(dataSet.filtered, 1, sd)
    sd.temp = median(protSD, na.rm = T) * tune.sigma
    print(sd.temp)
    for (i in 1:(nSamples)) {
        dataSet.to.impute.temp = rnorm(nFeatures, 
                                        mean = min.samples[i], 
                                        sd = sd.temp)
        dataSet.imputed[which(is.na(dataSet.mvs[, i])), i] = 
        dataSet.to.impute.temp[which(is.na(dataSet.mvs[,i]))]
    }
    return(dataSet.imputed)
  }

imputeLCMD

A collection of methods for left-censored missing data imputation

v2.0
GPL (>= 2)
Authors
Cosmin Lazar
Initial release
2015-01-18

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.