BACON for Regression or Multivariate Covariance Estimation
BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.
BACON()
applies the multivariate (covariance estimation)
algorithm, using mvBACON(x)
in any case, and when
y
is not NULL
adds a regression iteration phase,
using the auxiliary .lmBACON()
function.
BACON(x, y = NULL, intercept = TRUE, m = min(collect * p, n * 0.5), init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"), man.sel, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE) ## *Auxiliary* function: .lmBACON(x, y, intercept = TRUE, init.dis, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE)
x |
a multivariate matrix of dimension [n x p] considered as containing no missing values. |
y |
the response (n vector) in the case of regression, or
|
intercept |
logical indicating if an intercept has to be used for the regression. |
m |
integer in |
init.sel |
character string, specifying the initial selection
mode; see |
man.sel |
only when |
init.dis |
the distances of the x matrix used for the initial
subset determined by |
init.fraction |
if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis) |
collect |
numeric factor chosen by the user to define the size of the initial subset (p * collect) |
alpha |
significance level. |
maxsteps |
the maximal number of iteration steps (to prevent infinite loops) |
verbose |
logical indicating if messages are printed which trace progress of the algorithm. |
Notably about the initial selection mode, init.sel
, see its
description in the mvBACON
arguments list.
BACON(x,y,..)
(for regression) returns a list
with
components
subset |
the observation indices (in |
tis |
the t[i](y[m],X[m]) of eq (6) in the reference; the clean “basic subset” in the algorithm is defined the observations i with the smallest |t[i]|, and the t[i] can be regarded as scaled predicted errors. |
mv.dis |
the (final) discrepancies or distances of
|
mv.subset |
the “good” subset from |
“BACON” was also chosen in honor of Francis Bacon:
Whoever knows the ways of Nature will more easily notice her deviations;
and, on the other hand, whoever knows her deviations will more accurately
describe her ways.
Francis Bacon (1620), Novum Organum II 29.
Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1; 25.05.2001; modified six times till 17.6.2001.
Port to R, testing etc, by Martin Maechler.
Daniel Weeks (at pitt.edu) proposed a fix to a long standing buglet in
GiveTis()
computing the t[i], which was further improved
Maechler, for robustX version 1.2-3 (Feb. 2019).
Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi: 10.1016/S0167-9473(99)00101-2
mvBACON
, the multivariate version of the BACON
algorithm.
data(starsCYG, package = "robustbase") ## Plot simple data and fitted lines plot(starsCYG) lmST <- lm(log.light ~ log.Te, data = starsCYG) abline(lmST, col = "gray") # least squares line str(B.ST <- with(starsCYG, BACON(x = log.Te, y = log.light))) ## 'subset': A good set of of points (to determine regression): colB <- adjustcolor(2, 1/2) points(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset, pch = 19, cex = 1.5, col = colB) ## A BACON-derived line: lmB <- lm(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset) abline(lmB, col = colB, lwd = 2) require(robustbase) (RlmST <- lmrob(log.light ~ log.Te, data = starsCYG)) abline(RlmST, col = "blue")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.