Distance-based Kernel Score Test
This function test whether a metabolite-set is differentially expressed using a distance-based kernel score test.
dscore(x, y, lower, upper, m)
x |
numeric measurements of metabolite abundance level. |
y |
0/1 response indicating whether a subject is a case group or a control group. |
lower |
lower bound of the kernel parameter. |
upper |
upper bound of the kernel parameter. |
m |
number of grid points selected in the interval [lower, upper]. |
Let x be a p\times n matrix, where each column is a subject, y be a n \times 1 0/1 vector indicating the group label. This function tests whether this p-metabolite set is differentially expressed between two groups (more details can be found in Zhan et al. (2015)). It works in the following way.
A score test can be applied when the kernel parameter ρ is known. First, fit the null logistic model logit(pr(y=1))=β_0 to get estimate of β_0 as \hat{β_0}. Let \hat{μ_0}=invlogit(\hat{β_0}). Second, The n\times n kernel matrix is calculated as K(ρ)_{ij} = k(x_i,x_j,ρ), where x_i is ith column in x, k(\cdot) is the distance kernel function dkernel. Third, the test statistic Q(ρ) is calculated as
Q(ρ)=(y-\hat{μ_0})^T K(ρ) (y-\hat{μ_0}).
An standardized version S(ρ) of Q(ρ) can be calculated as S(ρ)= [Q(ρ)-μ_{Q}]/σ_{Q}. More details can be found in Liu et al.(2008).
When the kernel parameter ρ is not known. Suppose it takes values in [lower, upper]. Davies (1977) and Davies (1987) proposed a test based on the process \{S(ρ), ρ \in [lower,upper]\}. This test has rejection region of the form \{\sup_{L ≤q ρ ≤q U} S(ρ)> c \}. Using this test, an upper-bound for the p-value is given by:
Φ(-M)+V \exp(\frac{1}{2}M^2)/√{8π},
where Φ(\cdot) is the cumulative distribution function of standard normal density, M is the maximum of S(ρ) over the range of ρ and V=|S(ρ_1)-S(lower)|+|S(ρ_2)-S(ρ_1)|+\cdots+|S(upper)-S(ρ_m)| is the total variation of S(ρ) over the interval [lower, upper] and ρ_1,…,ρ_m are m grid points in the interval [lower, upper].
A p-value indicating whether the metabolite-set is differentially expressed or not under two conditions/groups.
Davies, R. B. (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 64,247-254.
Davies, R. B. (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 74,33-43.
Liu, D., Ghosh, D., & Lin, X. (2008). Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics, 9(1), 292.
Zhan, X., Patterson, A. D., & Ghosh, D. (2015). Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data. BMC Bioinformatics, 16(1), 77.
data(hcc) x=hcc[1:3,3:57] ## This metabolite-set contains the first three metabolites in the hcc dataset. y=c(rep(0,35),rep(1,20)) dscore(x,y,1,10,3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.