Kernel feature significance
Kernel feature significance for 1- to 6-dimensional data.
kfs(x, H, h, deriv.order=2, gridsize, gridtype, xmin, xmax, supp=3.7, eval.points, binned, bgridsize, positive=FALSE, adj.positive, w, verbose=FALSE, signif.level=0.05)
x |
matrix of data values |
H,h |
bandwidth matrix/scalar bandwidth. If these are missing, |
deriv.order |
derivative order (scalar) |
gridsize |
vector of number of grid points |
gridtype |
not yet implemented |
xmin,xmax |
vector of minimum/maximum values for grid |
supp |
effective support for standard normal |
eval.points |
vector or matrix of points at which estimate is evaluated |
binned |
flag for binned estimation |
bgridsize |
vector of binning grid sizes |
positive |
flag if 1-d data are positive. Default is FALSE. |
adj.positive |
adjustment applied to positive 1-d data |
w |
vector of weights. Default is a vector of all ones. |
verbose |
flag to print out progress information. Default is FALSE. |
signif.level |
overall level of significance for hypothesis tests. Default is 0.05. |
Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. Only the latter is currently implemented, and is also known as significant modal regions.
The hypothesis test at a grid point x is
H0(x): H f(x) < 0,
i.e. the density Hessian matrix H f(x) is negative definite.
The p-values are computed for each x using that
the test statistic is
approximately chi-squared distributed with d(d+1)/2 d.f.
We then use a Hochberg-type simultaneous testing procedure, based on the
ordered p-values, to control the
overall level of significance to be signif.level
. If
H0(x) is rejected then x
belongs to a significant modal region.
The computations are based on kdde(x, deriv.order=2)
so
kfs
inherits its behaviour from kdde
.
If the bandwidth H
is missing from kfs
, then
the default bandwidth is the plug-in selector
Hpi(,deriv.order=2)
. Likewise for missing h
.
The effective support, binning, grid size, grid range, positive
parameters are the same as kde
.
This function is similar to the featureSignif
function in the
feature package, except that it accepts unconstrained bandwidth
matrices.
A kernel feature significance estimate is an object of class
kfs
which is a list with fields
x |
data points - same as input |
eval.points |
vector or list of points at which the estimate is evaluated |
estimate |
binary matrix for significant feature at
|
h |
scalar bandwidth (1-d only) |
H |
bandwidth matrix |
gridtype |
"linear" |
gridded |
flag for estimation on a grid |
binned |
flag for binned estimation |
names |
variable names |
w |
vector of weights |
deriv.order |
derivative order (scalar) |
deriv.ind |
martix where each row is a vector of partial derivative indices. |
This is the same structure as a kdde
object, except that
estimate
is a binary matrix rather than real-valued.
Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.
Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.
Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.
library(MASS) data(geyser) geyser.fs <- kfs(geyser$duration, binned=TRUE) plot(geyser.fs, xlab="duration") ## see example in ? plot.kfs
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.