Goodness-of-fit test for the Functional Linear Model with scalar response
The function flm.test tests the composite null hypothesis of
a Functional Linear Model with scalar response (FLM),
H_0: Y=<X,β>+ε,
versus a general alternative. If β=β_0 is provided, then the simple hypothesis H_0: Y=<X,β_0>+ε is tested. The testing of the null hypothesis is done by a Projected Cramer-von Mises statistic (see Details).
flm.test( X.fdata, Y, beta0.fdata = NULL, B = 5000, est.method = "pls", p = NULL, type.basis = "bspline", verbose = TRUE, plot.it = TRUE, B.plot = 100, G = 200, ... )
X.fdata |
Functional covariate for the FLM. The object must be in the class
|
Y |
Scalar response for the FLM. Must be a vector with the same number of elements
as functions are in |
beta0.fdata |
Functional parameter for the simple null hypothesis, in the |
B |
Number of bootstrap replicates to calibrate the distribution of the test statistic.
|
est.method |
Estimation method for the unknown parameter β,
only used in the composite case. Mainly, there are two options: specify the number of basis
elements for the estimated β by
|
p |
Number of elements of the basis considered. If it is not given, an optimal |
type.basis |
Type of basis used to represent the functional process. Depending on the hypothesis it will have a different interpretation:
|
verbose |
Either to show or not information about computing progress. |
plot.it |
Either to show or not a graph of the observed trajectory,
and the bootstrap trajectories under the null composite hypothesis, of the
process R_n(.) (see Details). Note that if |
B.plot |
Number of bootstrap trajectories to show in the resulting plot of the test.
As the trajectories shown are the first |
G |
Number of projections used to compute the trajectories of the process R_n(.) by Monte Carlo. |
... |
Further arguments passed to |
The Functional Linear Model with scalar response (FLM), is defined as
Y=<X,β>+ε, for a functional process
X such that E[X(t)]=0, E[X(t)ε]=0
for all t and for a scalar variable Y such that E[Y]=0.
Then, the test assumes that Y and X.fdata are centred and will automatically
center them. So, bear in mind that when you apply the test for Y and X.fdata,
actually, you are applying it to Y-mean(Y) and fdata.cen(X.fdata)$Xcen.
The test statistic corresponds to the Cramer-von Mises norm of the Residual Marked
empirical Process based on Projections R_n(u,γ) defined in
Garcia-Portugues et al. (2014).
The expression of this process in a p-truncated basis of the space L^2[0,T]
leads to the p-multivariate process R_{n,p}(u,γ^{(p)}),
whose Cramer-von Mises norm is computed.
The choice of an appropriate p to represent the functional process X,
in case that is not provided, is done via the estimation of β for the composite
hypothesis. For the simple hypothesis, as no estimation of β is done, the choice
of p depends only on the functional process X. As the result of the test may
change for different p's, we recommend to use an automatic criterion to select p
instead of provide a fixed one.
The distribution of the test statistic is approximated by a wild bootstrap resampling on the
residuals, using the golden section bootstrap.
Finally, the graph shown if plot.it=TRUE represents the observed trajectory, and the
bootstrap trajectories under the null, of the process RMPP integrated on the projections:
R_n(u) \approx \frac{1}{G} ∑_{g=1}^G R_n(u,γ_g),
where γ_g are simulated as Gaussians processes. This gives a graphical idea of how distant is the observed trajectory from the null hypothesis.
An object with class "htest" whose underlying structure is a list containing
the following components:
statistic The value of the test statistic.
boot.statistics A vector of length B with the values of the bootstrap test statistics.
p.value The p-value of the test.
method The method used.
B The number of bootstrap replicates used.
type.basis The type of basis used.
beta.est The estimated functional parameter β in the composite
hypothesis. For the simple hypothesis, the given beta0.fdata.
p The number of basis elements passed or automatically chosen.
ord The optimal order for PC and PLS given by fregre.pc.cv and fregre.pls.cv. For other methods is setted to 1:p.
data.name The character string "Y=<X,b>+e"
No NA's are allowed neither in the functional covariate nor in the scalar response.
Eduardo Garcia-Portugues. Please, report bugs and suggestions to edgarcia@est-econ.uc3m.es
Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory, 22, 1030-1051. http://dx.doi.org/10.1017/S0266466606060506
Garcia-Portugues, E., Gonzalez-Manteiga, W. and Febrero-Bande, M. (2014). A goodness–of–fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3), 761-778. http://dx.doi.org/10.1080/10618600.2013.812519
# Simulated example #
X=rproc2fdata(n=100,t=seq(0,1,l=101),sigma="OU")
beta0=fdata(mdata=cos(2*pi*seq(0,1,l=101))-(seq(0,1,l=101)-0.5)^2+
rnorm(101,sd=0.05),argvals=seq(0,1,l=101),rangeval=c(0,1))
Y=inprod.fdata(X,beta0)+rnorm(100,sd=0.1)
dev.new(width=21,height=7)
par(mfrow=c(1,3))
plot(X,main="X")
plot(beta0,main="beta0")
plot(density(Y),main="Density of Y",xlab="Y",ylab="Density")
rug(Y)
## Not run:
# Composite hypothesis: do not reject FLM
pcvm.sim=flm.test(X,Y,B=50,B.plot=50,G=100,plot.it=TRUE)
pcvm.sim
flm.test(X,Y,B=5000)
# Estimated beta
dev.new()
plot(pcvm.sim$beta.est)
# Simple hypothesis: do not reject beta=beta0
flm.test(X,Y,beta0.fdata=beta0,B=50,B.plot=50,G=100)
flm.test(X,Y,beta0.fdata=beta0,B=5000)
# AEMET dataset #
data(aemet)
# Remove the 5\
dev.new()
res.FM=depth.FM(aemet$temp,draw=TRUE)
qu=quantile(res.FM$dep,prob=0.05)
l=which(res.FM$dep<=qu)
lines(aemet$temp[l],col=3)
aemet$df$name[l]
# Data without outliers
wind.speed=apply(aemet$wind.speed$data,1,mean)[-l]
temp=aemet$temp[-l]
# Exploratory analysis: accept the FLM
pcvm.aemet=flm.test(temp,wind.speed,est.method="pls",B=100,B.plot=50,G=100)
pcvm.aemet
# Estimated beta
dev.new()
plot(pcvm.aemet$beta.est,lwd=2,col=2)
# B=5000 for more precision on calibration of the test: also accept the FLM
flm.test(temp,wind.speed,est.method="pls",B=5000)
# Simple hypothesis: rejection of beta0=0? Limiting p-value...
dat=rep(0,length(temp$argvals))
flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
rangeval=temp$rangeval),B=100)
flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
rangeval=temp$rangeval),B=5000)
# Tecator dataset #
data(tecator)
names(tecator)
absorp=tecator$absorp.fdata
ind=1:129 # or ind=1:215
x=absorp[ind,]
y=tecator$y$Fat[ind]
tt=absorp[["argvals"]]
# Exploratory analysis for composite hypothesis with automatic choose of p
pcvm.tecat=flm.test(x,y,B=100,B.plot=50,G=100)
pcvm.tecat
# B=5000 for more precision on calibration of the test: also reject the FLM
flm.test(x,y,B=5000)
# Distribution of the PCvM statistic
plot(density(pcvm.tecat$boot.statistics),lwd=2,xlim=c(0,10),
main="PCvM distribution", xlab="PCvM*",ylab="Density")
rug(pcvm.tecat$boot.statistics)
abline(v=pcvm.tecat$statistic,col=2,lwd=2)
legend("top",legend=c("PCvM observed"),lwd=2,col=2)
# Simple hypothesis: fixed p
dat=rep(0,length(x$argvals))
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
rangeval=x$rangeval),B=100,p=11)
# Simple hypothesis, automatic choose of p
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
rangeval=x$rangeval),B=100)
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
rangeval=x$rangeval),B=5000)
## End(Not run)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.