robustbase: outlierStats – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

outlierStats

Robust Regression Outlier Statistics

Description

Simple statistics about observations with robustness weight of almost zero for models that include factor terms. The number of rejected observations and the mean robustness weights are computed for each level of each factor included in the model.

Usage

outlierStats(object, x = object$x, control = object$control,
             epsw = control$eps.outlier, epsx = control$eps.x,
             warn.limit.reject = control$warn.limit.reject,
             warn.limit.meanrw = control$warn.limit.meanrw)

Arguments

`object`	object of class `"lmrob"`, typically the result of a call to `lmrob`.
`x`	design matrix
`control`	list as returned by `lmrob.control`.
`epsw`	limit on the robustness weight below which an observation is considered to be an outlier. Either a `numeric(1)` or a `function` that takes the number of observations as an argument.
`epsx`	limit on the absolute value of the elements of the design matrix below which an element is considered zero. Either a numeric(1) or a function that takes the maximum absolute value in the design matrix as an argument.
`warn.limit.reject`	limit of ratio # rejected / # obs in level above (>=) which a warning is produced. Set to `NULL` to disable warning.
`warn.limit.meanrw`	limit of the mean robustness per factor level below which (<=) a warning is produced. Set to `NULL` to disable warning.

Details

For models that include factors, the fast S-algorithm used by lmrob can produce “bad” fits for some of the factor levels, especially if there are many levels with only a few observations. Such a “bad” fit is characterized as a fit where most of the observations in a level of a factor are rejected, i.e., are assigned robustness weights of zero or nearly zero. We call such a fit a “local exact fit”.

If a local exact fit is detected, then we recommend to increase some of the control parameters of the “fast S”-algorithm. As a first aid solution in such cases, one can use setting="KS2014", see also lmrob.control.

This function is called internally by lmrob to issue a warning if a local exact fit is detected. The output is available as ostats in objects of class "lmrob" (only if the statistic is computed).

Value

A data frames for each column with any zero elementes as well as an overall statistic. The data frame consist of the names of the coefficients in question, the number of non-zero observation in that level (N.nonzero), the number of rejected observations (N.rejected), the ratio of rejected observations to the number of observations in that level (Ratio) and the mean robustness weight of all the observations in the corresponding level (Mean.RobWeight).

Author(s)

Manuel Koller

References

Koller, M. and Stahel, W.A. (2017) Nonsingular subsampling for regression S~estimators with categorical predictors, Computational Statistics 32(2): 631–646. doi: 10.1007/s00180-016-0679-x

Examples

## artificial data example
data <- expand.grid(grp1 = letters[1:5], grp2 = letters[1:5], rep=1:3)
set.seed(101)
data$y <- c(rt(nrow(data), 1))
## compute outlier statistics for all the estimators
control <- lmrob.control(method = "SMDM",
                         compute.outlier.stats = c("S", "MM", "SMD", "SMDM"))
## warning is only issued for some seeds
set.seed(2)
fit1 <- lmrob(y ~ grp1*grp2, data, control = control)
## do as suggested:
fit2 <- lmrob(y ~ grp1*grp2, data, setting = "KS2014")

## the plot function should work for such models as well
plot(fit1)

## Not run: 
  ## access statistics:
  fit1$ostats ## SMDM
  fit1$init$ostats ## SMD
  fit1$init$init$ostats ## SM
  fit1$init$init$init.S$ostats ## S

## End(Not run)

robustbase

Basic Robust Statistics

v0.93-7

GPL (>= 2)

Authors

Martin Maechler [aut, cre] (<https://orcid.org/0000-0002-8685-9910>), Peter Rousseeuw [ctb] (Qn and Sn), Christophe Croux [ctb] (Qn and Sn), Valentin Todorov [aut] (most robust Cov), Andreas Ruckstuhl [aut] (nlrob, anova, glmrob), Matias Salibian-Barrera [aut] (lmrob orig.), Tobias Verbeke [ctb, fnd] (mc, adjbox), Manuel Koller [aut] (mc, lmrob, psi-func.), Eduardo L. T. Conceicao [aut] (MM-, tau-, CM-, and MTL- nlrob), Maria Anna di Palma [ctb] (initial version of Comedian)

Initial release

2021-01-04