Diagnostic plot for identifying local outliers with varying size of neighborhood
Computes global and pairwise Mahalanobis distances for visualizing global and local multivariate outliers. The size of the neighborhood (number of neighbors) is varying, but the fraction of neighbors is fixed.
locoutNeighbor(dat, X, Y, propneighb = 0.1, variant = c("dist", "knn"), usemax = 1/3, npoints = 50, chisqqu = 0.975, indices = NULL, xlab = NULL, ylab = NULL, colall = gray(0.7), colsel = 1, ...)
dat |
multivariate data set (without coordinates) |
X |
X coordinates of the data points |
Y |
Y coordinates of the data points |
propneighb |
proportion of neighbors to be included in tolerance ellipse |
variant |
either search for neighbors according to the Eucl.Distance, or according to kNN |
usemax |
for either variant: give fraction of points (max Dist) that is used for the plot |
npoints |
computation is done at most at npoints points |
chisqqu |
quantile of the chisquare distribution for splitting the plot |
indices |
if this is not NULL, these should be indices of observations to be highlighted |
xlab |
x-axis label for plot |
ylab |
y-axis label for plot |
colall |
color for lines if indices is NULL |
colsel |
color for lines if indices are selected |
... |
additional parameters for plotting |
For this diagnostic tool, the number of neighbors is varied up to a fraction of usemax observations. Then propneighb (called beta) is fixed, and for each observation we compute the degree of isolation from a fraction of 1-beta of its neighbors. Neighborhood can be defined either via the Euclidean distance or by k-Nearest-Neighbors. For computational reasons, all computations are done at most for npoints points. The critical value for outliers is the quantile chisqqu of the chisquare distribution. One can also provide indices of observations (for indices). Then the corresponding lines in the plots will be highlighted.
indices.reg |
indices of the (selected) observations being regular observations |
indices.out |
indices of the (selected) observations being golbal outliers |
Peter Filzmoser <P.Filzmoser@tuwien.ac.at> http://cstat.tuwien.ac.at/filz/
P. Filzmoser, A. Ruiz-Gazen, and C. Thomas-Agnan: Identification of local multivariate outliers. Submitted for publication, 2012.
# use data from illustrative example in paper: data(X) data(Y) data(dat) res <- locoutNeighbor(dat,X,Y,variant="knn",usemax=1,chisqqu=0.975,indices=c(1,11,24,36), propneighb=0.1,npoints=100)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.