Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

mvBACON

BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators


Description

This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.

Usage

mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.95,
        init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
        man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)

Arguments

x

numeric matrix (of dimension [n x p]), not supposed to contain missing values.

collect

a multiplication factor c, when init.sel is not "manual", to define m, the size of the initial basic subset, as c * p, in practice, m <- min(p * collect, n/2).

m

integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual".

alpha

significance level for the chisq cutoff, used to define the next iterations basic subset.

init.sel

character string, specifying the initial selection mode; implemented modes are:

"Mahalanobis"

based on Mahalanobis distances (default); the version V1 of the reference; affine invariant but not robust.

"dUniMedian"

based on the distances from the univariate medians; ; the version V2 of the reference; robust but not affine invariant.

"random"

based on a random selection, i.e., reproducible only via set.seed().

"manual"

based on manual selection; in this case, a vector man.sel containing the indices of the selected observations must be specified.

"Mahalanobis", "dUniMedian" where proposed by Hadi and the other authors in the reference as versions ‘V_1’ and ‘V_2’, as well as "manual", while "random" is provided in order to study the behaviour of BACON.

man.sel

only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).

maxsteps

maximal number of iteration steps.

allowSingular

logical indicating a solution should be sought also when no matrix of rank p is found.

verbose

logical indicating if messages are printed which trace progress of the algorithm.

Value

a list with components

subset

logical vector of length n where the i-th entry is true iff the i-th observation is part of the final selection.

dis

numeric vector of length n with the (Mahalanobis) distances.

cov

p x p matrix, the corresponding robust estimate of covariance.

Author(s)

Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1. Port to R, testing etc, by Martin Maechler

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi: 10.1016/S0167-9473(99)00101-2

See Also

covMcd for a high-breakdown (but more computer intensive) method; BACON for a “generalization”, notably to regression.

Examples

require(robustbase) # for example data and covMcd():
## simple 2D example :
 plot(starsCYG, main = "starsCYG  data  (n=47)")
 B.st <- mvBACON(starsCYG)
 points(starsCYG[ ! B.st$subset,], pch = 4, col = 2, cex = 1.5)
 stopifnot(identical(which(!B.st$subset), c(7L,9L,11L,14L,20L,30L,34L)))
 ## finds the clear outliers (and 3 "borderline")

 ## 'coleman' from pkg 'robustbase'
 coleman.x <- data.matrix(coleman[, 1:6])
 Cc <- covMcd (coleman.x) # truly robust
 summary(Cc) # -> 6 outliers (1,3,10,12,17,18)
 Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm??
 Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian")
 stopifnot(all.equal(Cb1, Cb2))
 Cb.r <- lapply(1:20, function(i) { set.seed(i)
                     mvBACON(coleman.x, init.sel="random", verbose=FALSE) })
 nm <- names(Cb.r[[1]]); nm <- nm[nm != "steps"]
 all(eqC <- sapply(Cb.r[-1], function(CC) all.equal(CC[nm], Cb.r[[1]][nm]))) # TRUE
 ## --> BACON always  breaks down, i.e., does not see the outliers here
 ## breaks down even when manually starting with all the non-outliers:
 Cb.man <- mvBACON(coleman.x, init.sel = "manual",
                   man.sel = setdiff(1:20, c(1,3,10,12,17,18)))
 which( ! Cb.man$subset) # the outliers according to mvBACON : _none_

robustX

'eXtra' / 'eXperimental' Functionality for Robust Statistics

v1.2-4
GPL (>= 2)
Authors
Werner Stahel, Martin Maechler [aut, cre] (<https://orcid.org/0000-0002-8685-9910>) and potentially others
Initial release
2019-02-25

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.