Estimate the true trait underlying a list of surrogate markers.
Assume an imprecisely measured trait y
that is related to the true, unobserved trait yTRUE as follows yTRUE=y+noise where noise is assumed to have mean zero and a constant variance. Assume you have 1 or more surrogate markers for yTRUE corresponding to the columns of datX
. The function implements several approaches for estimating yTRUE based on the inputs y
and/or datX
.
TrueTrait(datX, y, datXtest=NULL, corFnc = "bicor", corOptions = "use = 'pairwise.complete.obs'", LeaveOneOut.CV=FALSE, skipMissingVariables=TRUE, addLinearModel=FALSE)
datX |
is a vector or data frame whose columns correspond to the surrogate markers (variables) for the true underlying trait. The number of rows of |
y |
is a numeric vector which specifies the observed trait. |
datXtest |
can be set as a matrix or data frame of a second, independent test data set. Its columns should correspond to those of |
corFnc |
Character string specifying the correlation function to be used in the calculations.
Recomended values are the default Pearson
correlation |
corOptions |
Character string giving additional arguments to the function specified in |
LeaveOneOut.CV |
logical. If TRUE then leave one out cross validation estimates will be calculated for |
skipMissingVariables |
logical. If TRUE then variables whose values are missing for a given observation will be skipped when estimating the true trait of that particular observation. Thus, the estimate of a particular observation are determined by all the variables whose values are non-missing. |
addLinearModel |
logical. If TRUE then the function also estimates the true trait based on the predictions of the linear model |
This R function implements formulas described in Klemera and Doubal (2006). The assumptions underlying these formulas are described in Klemera et al. But briefly,
the function provides several estimates of the true underlying trait under the following assumptions:
1) There is a true underlying trait that affects y
and a list of surrogate markers corresponding to the columns of datX
.
2) There is a linear relationship between the true underlying trait and y
and the surrogate markers.
3) yTRUE =y +Noise where the Noise term has a mean of zero and a fixed variance.
4) Weighted least squares estimation is used to relate the surrogate markers to the underlying trait where the weights are proportional to 1/ssq.j where ssq.j is the noise variance of the j-th marker.
Specifically,
output y.true1
corresponds to formula 31, y.true2
corresponds to formula 25, and y.true3
corresponds to formula 34.
Although the true underlying trait yTRUE is not known, one can estimate the standard deviation between the
estimate y.true2
and yTRUE using formula 33. Similarly, one can estimate the SD for the estimate
y.true3
using formula 42. These estimated SDs correspond to output components 2 and 3, respectively.
These SDs are valuable since they provide a sense of how accurate the measure is.
To estimate the correlations between y
and the surrogate markers, one can specify different
correlation measures. The default method is based on the Person correlation but one can also specify the
biweight midcorrelation by choosing "bicor", see help(bicor) to learn more.
When the datX
is comprised of observations measured in different strata (e.g. different batches or
independent data sets) then one can obtain stratum specific estimates by specifying the strata using the
argument Strata
. In this case, the estimation focuses on one stratum at a time.
A list with the following components.
datEstimates |
is a data frame whose columns corresponds to estimates of the true underlying trait. The number of rows equals the number of observations, i.e. the length of |
datEstimatestest |
is output only if a test data set has been specified in the argument
|
datEstimates.LeaveOneOut.CV |
is output only if the argument |
SD.ytrue2 |
is a scalar. This is an estimate of the standard deviation between the estimate |
SD.ytrue3 |
is a scalar. This is an estimate of the standard deviation between |
datVariableInfo |
is a data frame that reports information for each variable (column of |
datEstimatesByStratum |
a data frame that will only be output if |
SD.ytrue2ByStratum |
a vector of length equal to the different levels of |
datVariableInfoByStratum |
a list whose components are matrices with variable information. Each list component reports the variable information in the stratum specified by unique(Strata). |
Steve Horvath
Klemera P, Doubal S (2006) A new approach to the concept and computation of biological age. Mechanisms of Ageing and Development 127 (2006) 240-248
Choa IH, Parka KS, Limb CJ (2010) An Empirical Comparative Study on Validation of Biological Age Estimation Algorithms with an Application of Work Ability Index. Mechanisms of Ageing and Development Volume 131, Issue 2, February 2010, Pages 69-78
# observed trait y=rnorm(1000,mean=50,sd=20) # unobserved, true trait yTRUE =y +rnorm(100,sd=10) # now we simulate surrogate markers around the true trait datX=simulateModule(yTRUE,nGenes=20, minCor=.4,maxCor=.9,geneMeans=rnorm(20,50,30) ) True1=TrueTrait(datX=datX,y=y) datTrue=True1$datEstimates par(mfrow=c(2,2)) for (i in 1:dim(datTrue)[[2]] ){ meanAbsDev= mean(abs(yTRUE-datTrue[,i])) verboseScatterplot(datTrue[,i],yTRUE,xlab=names(datTrue)[i], main=paste(i, "MeanAbsDev=", signif(meanAbsDev,3))); abline(0,1) } #compare the estimated standard deviation of y.true2 True1[[2]] # with the true SD sqrt(var(yTRUE-datTrue$y.true2)) #compare the estimated standard deviation of y.true3 True1[[3]] # with the true SD sqrt(var(yTRUE-datTrue$y.true3))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.