fda.usc: flm.test – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

fda.usc

flm.test

Goodness-of-fit test for the Functional Linear Model with scalar response

Description

The function flm.test tests the composite null hypothesis of a Functional Linear Model with scalar response (FLM),

H_0: Y=<X,β>+ε,

versus a general alternative. If β=β_0 is provided, then the simple hypothesis H_0: Y=<X,β_0>+ε is tested. The testing of the null hypothesis is done by a Projected Cramer-von Mises statistic (see Details).

Usage

flm.test(
  X.fdata,
  Y,
  beta0.fdata = NULL,
  B = 5000,
  est.method = "pls",
  p = NULL,
  type.basis = "bspline",
  verbose = TRUE,
  plot.it = TRUE,
  B.plot = 100,
  G = 200,
  ...
)

Arguments

`X.fdata`	Functional covariate for the FLM. The object must be in the class `fdata`.
`Y`	Scalar response for the FLM. Must be a vector with the same number of elements as functions are in `X.fdata`.
`beta0.fdata`	Functional parameter for the simple null hypothesis, in the `fdata` class. Recall that the `argvals` and `rangeval` arguments of `beta0.fdata` must be the same of `X.fdata`. A possibility to do this is to consider, for example for β_0=0 (the simple null hypothesis of no interaction), `beta0.fdata=fdata(mdata=rep(0,length(X.fdata$argvals)),argvals=X.fdata$argvals,rangeval=X.fdata$rangeval)`. If `beta0.fdata=NULL` (default), the function will test for the composite null hypothesis.
`B`	Number of bootstrap replicates to calibrate the distribution of the test statistic. `B=5000` replicates are the recommended for carry out the test, although for exploratory analysis (not inferential), an acceptable less time-consuming option is `B=500`.
`est.method`	Estimation method for the unknown parameter β, only used in the composite case. Mainly, there are two options: specify the number of basis elements for the estimated β by `p` or optimally select `p` by a data-driven criteria (see Details section for discussion). Then, it must be one of the following methods: `"pc"` If `p`, the number of basis elements, is given, then β is estimated by `fregre.pc`. Otherwise, an optimum `p` is chosen using `fregre.pc.cv` and the `"SICc"` criteria. `"pls"` If `p` is given, β is estimated by `fregre.pls`. Otherwise, an optimum `p` is chosen using `fregre.pls.cv` and the `"SICc"` criteria. This is the default argument as it has been checked empirically that provides a good balance between the performance of the test and the estimation of β. `"basis"` If `p` is given, β is estimated by `fregre.basis`. Otherwise, an optimum `p` is chosen using `fregre.basis.cv` and the `"GCV.S"` criteria. In these functions, the same basis for the arguments `basis.x` and `basis.b` is considered. The type of basis used will be the given by the argument `type.basis` and must be one of the class of `create.basis`. Further arguments passed to `create.basis` (not `rangeval` that is taken as the `rangeval` of `X.fdata`), can be passed throughout `...` .
`p`	Number of elements of the basis considered. If it is not given, an optimal `p` will be chosen using a specific criteria (see `est.method` and `type.basis` arguments).
`type.basis`	Type of basis used to represent the functional process. Depending on the hypothesis it will have a different interpretation: Simple hypothesis. One of these options: `"bspline"` If `p` is given, the functional process is expressed in a basis of `p` B-splines. If not, an optimal `p` will be chosen by `optim.basis`, using the `"GCV.S"` criteria. `"fourier"` If `p` is given, the functional process is expressed in a basis of `p` fourier functions. If not, an optimal `p` will be chosen by `optim.basis`, using the `"GCV.S"` criteria. `"pc"` `p` must be given. Expresses the functional process in a basis of `p` PC. `"pls"` `p` must be given. Expresses the functional process in a basis of `p` PLS. Although other of the basis supported by `create.basis` are possible too, `"bspline"` and `"fourier"` are recommended. Other basis may cause incompatibilities. Composite hypothesis. This argument is only used when `est.method="basis"` and, in this case, claims for the type of basis used in the basis estimation method of the functional parameter. Again, basis `"bspline"` and `"fourier"` are recommended, as other basis may cause incompatibilities.
`verbose`	Either to show or not information about computing progress.
`plot.it`	Either to show or not a graph of the observed trajectory, and the bootstrap trajectories under the null composite hypothesis, of the process R_n(.) (see Details). Note that if `plot.it=TRUE`, the function takes more time to run.
`B.plot`	Number of bootstrap trajectories to show in the resulting plot of the test. As the trajectories shown are the first `B.plot` of `B`, `B.plot` must be lower or equal to `B`.
`G`	Number of projections used to compute the trajectories of the process R_n(.) by Monte Carlo.
`...`	Further arguments passed to `create.basis`.

Details

The Functional Linear Model with scalar response (FLM), is defined as Y=<X,β>+ε, for a functional process X such that E[X(t)]=0, E[X(t)ε]=0 for all t and for a scalar variable Y such that E[Y]=0. Then, the test assumes that Y and X.fdata are centred and will automatically center them. So, bear in mind that when you apply the test for Y and X.fdata, actually, you are applying it to Y-mean(Y) and fdata.cen(X.fdata)$Xcen. The test statistic corresponds to the Cramer-von Mises norm of the Residual Marked empirical Process based on Projections R_n(u,γ) defined in Garcia-Portugues et al. (2014). The expression of this process in a p-truncated basis of the space L^2[0,T] leads to the p-multivariate process R_{n,p}(u,γ^{(p)}), whose Cramer-von Mises norm is computed. The choice of an appropriate p to represent the functional process X, in case that is not provided, is done via the estimation of β for the composite hypothesis. For the simple hypothesis, as no estimation of β is done, the choice of p depends only on the functional process X. As the result of the test may change for different p's, we recommend to use an automatic criterion to select p instead of provide a fixed one. The distribution of the test statistic is approximated by a wild bootstrap resampling on the residuals, using the golden section bootstrap. Finally, the graph shown if plot.it=TRUE represents the observed trajectory, and the bootstrap trajectories under the null, of the process RMPP integrated on the projections:

R_n(u) \approx \frac{1}{G} ∑_{g=1}^G R_n(u,γ_g),

where γ_g are simulated as Gaussians processes. This gives a graphical idea of how distant is the observed trajectory from the null hypothesis.

Value

An object with class "htest" whose underlying structure is a list containing the following components:

statistic The value of the test statistic.
boot.statistics A vector of length B with the values of the bootstrap test statistics.
p.value The p-value of the test.
method The method used.
B The number of bootstrap replicates used.
type.basis The type of basis used.
beta.est The estimated functional parameter β in the composite hypothesis. For the simple hypothesis, the given beta0.fdata.
p The number of basis elements passed or automatically chosen.
ord The optimal order for PC and PLS given by fregre.pc.cv and fregre.pls.cv. For other methods is setted to 1:p.
data.name The character string "Y=<X,b>+e"

Note

No NA's are allowed neither in the functional covariate nor in the scalar response.

Author(s)

Eduardo Garcia-Portugues. Please, report bugs and suggestions to edgarcia@est-econ.uc3m.es

References

Escanciano, J. C. (2006). A consistent diagnostic test for regression models using projections. Econometric Theory, 22, 1030-1051. http://dx.doi.org/10.1017/S0266466606060506

Garcia-Portugues, E., Gonzalez-Manteiga, W. and Febrero-Bande, M. (2014). A goodness–of–fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3), 761-778. http://dx.doi.org/10.1080/10618600.2013.812519

Examples

# Simulated example #
X=rproc2fdata(n=100,t=seq(0,1,l=101),sigma="OU")
beta0=fdata(mdata=cos(2*pi*seq(0,1,l=101))-(seq(0,1,l=101)-0.5)^2+
            rnorm(101,sd=0.05),argvals=seq(0,1,l=101),rangeval=c(0,1))
Y=inprod.fdata(X,beta0)+rnorm(100,sd=0.1)

dev.new(width=21,height=7)
par(mfrow=c(1,3))
plot(X,main="X")
plot(beta0,main="beta0")
plot(density(Y),main="Density of Y",xlab="Y",ylab="Density")
rug(Y)

## Not run: 
# Composite hypothesis: do not reject FLM
pcvm.sim=flm.test(X,Y,B=50,B.plot=50,G=100,plot.it=TRUE)
pcvm.sim
flm.test(X,Y,B=5000)
 
# Estimated beta
dev.new()
plot(pcvm.sim$beta.est)

# Simple hypothesis: do not reject beta=beta0
flm.test(X,Y,beta0.fdata=beta0,B=50,B.plot=50,G=100)
flm.test(X,Y,beta0.fdata=beta0,B=5000) 

# AEMET dataset #
data(aemet)
# Remove the 5\
dev.new()
res.FM=depth.FM(aemet$temp,draw=TRUE)
qu=quantile(res.FM$dep,prob=0.05)
l=which(res.FM$dep<=qu)
lines(aemet$temp[l],col=3)
aemet$df$name[l]

# Data without outliers 
wind.speed=apply(aemet$wind.speed$data,1,mean)[-l]
temp=aemet$temp[-l]
# Exploratory analysis: accept the FLM
pcvm.aemet=flm.test(temp,wind.speed,est.method="pls",B=100,B.plot=50,G=100)
pcvm.aemet

# Estimated beta
dev.new()
plot(pcvm.aemet$beta.est,lwd=2,col=2)
# B=5000 for more precision on calibration of the test: also accept the FLM
flm.test(temp,wind.speed,est.method="pls",B=5000) 

# Simple hypothesis: rejection of beta0=0? Limiting p-value...
dat=rep(0,length(temp$argvals))
flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
                                            rangeval=temp$rangeval),B=100)
flm.test(temp,wind.speed, beta0.fdata=fdata(mdata=dat,argvals=temp$argvals,
                                            rangeval=temp$rangeval),B=5000) 
                                            
# Tecator dataset #
data(tecator)
names(tecator)
absorp=tecator$absorp.fdata
ind=1:129 # or ind=1:215
x=absorp[ind,]
y=tecator$y$Fat[ind]
tt=absorp[["argvals"]]

# Exploratory analysis for composite hypothesis with automatic choose of p
pcvm.tecat=flm.test(x,y,B=100,B.plot=50,G=100)
pcvm.tecat

# B=5000 for more precision on calibration of the test: also reject the FLM
flm.test(x,y,B=5000) 

# Distribution of the PCvM statistic
plot(density(pcvm.tecat$boot.statistics),lwd=2,xlim=c(0,10),
              main="PCvM distribution", xlab="PCvM*",ylab="Density")
rug(pcvm.tecat$boot.statistics)
abline(v=pcvm.tecat$statistic,col=2,lwd=2)
legend("top",legend=c("PCvM observed"),lwd=2,col=2)

# Simple hypothesis: fixed p
dat=rep(0,length(x$argvals))
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
                               rangeval=x$rangeval),B=100,p=11)
                               
# Simple hypothesis, automatic choose of p
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
                               rangeval=x$rangeval),B=100)
flm.test(x,y,beta0.fdata=fdata(mdata=dat,argvals=x$argvals,
                               rangeval=x$rangeval),B=5000)

## End(Not run)

fda.usc

Functional Data Analysis and Utilities for Statistical Computing

v2.0.2

GPL-2

Authors

Manuel Febrero Bande [aut], Manuel Oviedo de la Fuente [aut, cre], Pedro Galeano [ctb], Alicia Nieto [ctb], Eduardo Garcia-Portugues [ctb]

Initial release

2020-02-17

flm.test

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

fda.usc

We don't support your browser anymore