Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

NNclean

Nearest neighbor based clutter/noise detection


Description

Detects if data points are noise or part of a cluster, based on a Poisson process model.

Usage

NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)

## S3 method for class 'nnclean'
print(x, ...)

Arguments

data

numerical matrix or data frame.

k

integer. Number of considered nearest neighbors per point.

distances

distance matrix object of class dist. If specified, it is used instead of computing distances from the data.

edge.correct

logical. If TRUE and the data is two-dimensional, neighbors for points at the edges of the parent region of the noise Poisson process are determined after wrapping the region onto a toroid.

wrap

numerical. If edge.correct=TRUE, points in a strip of size wrap*range along the edge for each variable are candidates for being neighbors of points from the opposite.

convergence

numerical. Convergence criterion for EM-algorithm.

plot

logical. If TRUE, a histogram of the distance to kth nearest neighbor and fit is plotted.

quiet

logical. If FALSE, the likelihood is printed during the iterations.

x

object of class nnclean.

...

necessary for print methods.

Details

The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.

Value

NNclean returns a list of class nnclean with components

z

0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.

probs

vector of estimated a priori probabilities for each point to belong to the cluster component.

k

see above.

lambda1

intensity parameter of cluster component.

lambda2

intensity parameter of noise component.

p

estimated probability of cluster component.

kthNND

distance to kth nearest neighbor.

Note

The software can be freely used for non-commercial purposes, and can be freely distributed for non-commercial purposes only.

Author(s)

R-port by Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Examples

library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)

prabclus

Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data

v2.3-2
GPL
Authors
Christian Hennig <christian.hennig@unibo.it>, Bernhard Hausdorf <Hausdorf@zoologie.uni-hamburg.de>
Initial release
2020-01-06

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.