Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

mc

Medcouple, a Robust Measure of Skewness


Description

Compute the ‘medcouple’, a robust concept and estimator of skewness. The medcouple is defined as a scaled median difference of the left and right half of distribution, and hence not based on the third moment as the classical skewness.

Usage

mc(x, na.rm = FALSE, doReflect = (length(x) <= 100),
   doScale = TRUE,               # <- chg default to 'FALSE' ?
   eps1 = 1e-14, eps2 = 1e-15,   # << new in 0.93-2 (2018-07..)
   maxit = 100, trace.lev = 0, full.result = FALSE)

Arguments

x

a numeric vector

na.rm

logical indicating how missing values (NAs) should be dealt with.

doReflect

logical indicating if the internal MC should also be computed on the reflected sample -x, with final result (mc.(x) - mc.(-x))/2. This makes sense since the internal MC, mc.() computes the himedian() which can differ slightly from the median.

doScale

logical indicating if the internal algorithm should also scale the data (using the most distant value from the median which is unrobust and numerically dangerous); scaling has been the hardwired default in the original algorithm and in R's mc() till summer 2018.

eps1, eps2

tolerance in the algorithm; eps1 is used as a for convergence tolerance, where eps2 is only used in the internal h_kern() function to prevent underflow to zero, so could be considerably smaller. The original code implicitly hard coded in C eps1 := eps2 := 1e-13; only change with care!

maxit

maximal number of iterations; typically a few should be sufficient.

trace.lev

integer specifying how much diagnostic output the algorithm (in C) should produce. No output by default, most output for trace.lev = 5.

full.result

logical indicating if the full return values (from C) should be returned as a list via attr(*, "mcComp").

Value

a number between -1 and 1, which is the medcouple, MC(x). For r <- mc(x, full.result = TRUE, ....), then attr(r, "mcComp") is a list with components

medc

the medcouple mc.(x).

medc2

the medcouple mc.(-x) if doReflect=TRUE.

eps

tolerances used.

iter,iter2

number of iterations used.

converged,converged2

logical specifying “convergence”.

Convergence Problems

For extreme cases there are convergence problems.

Some of them can be alleviated by “loosening” the tolerances eps1 and eps2.
For others, with pecular values, notably many almost-ties with the median, it can help extremely to replace mc(x, *) by mc(jitter(x), *), or also just mc(signif(x), *),

Also, the algorithm not only centers the data around the median but also scales them by the extremes which may have a negative effect e.g., when changing an extreme outlier to even more extreme, the result changes wrongly; see the 'mc10x' example.

Author(s)

Guy Brys; modifications by Tobias Verbeke and bug fixes and extensions by Manuel Koller and Martin Maechler.

References

Guy Brys, Mia Hubert and Anja Struyf (2004) A Robust Measure of Skewness; JCGS 13 (4), 996–1017.

Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis 52, 5186–5201.

See Also

Qn for a robust measure of scale (aka “dispersion”), ....

Examples

mc(1:5) # 0 for a symmetric sample

x1 <- c(1, 2, 7, 9, 10)
mc(x1) # = -1/3

data(cushny)
mc(cushny) # 0.125

stopifnot(mc(c(-20, -5, -2:2, 5, 20)) == 0,
          mc(x1, doReflect=FALSE) ==  -mc(-x1, doReflect=FALSE),
          all.equal(mc(x1, doReflect=FALSE), -1/3, tolerance = 1e-12))

## Susceptibility of the current algorithm to large outliers :
dX10 <- function(X) c(1:5,7,10,15,25, X) # generate skewed size-10 with 'X'
x <- c(10,20,30, 100^(1:20))
(mc10x <- vapply(x, function(X) mc(dX10(X)), 1))
## limit X -> Inf  should be 7/12 = 0.58333...  but that "breaks down a bit" :
plot(x, mc10x, type="b", main = "mc( c(1:5,7,10,15,25, X) )", xlab="X", log="x")

robustbase

Basic Robust Statistics

v0.93-7
GPL (>= 2)
Authors
Martin Maechler [aut, cre] (<https://orcid.org/0000-0002-8685-9910>), Peter Rousseeuw [ctb] (Qn and Sn), Christophe Croux [ctb] (Qn and Sn), Valentin Todorov [aut] (most robust Cov), Andreas Ruckstuhl [aut] (nlrob, anova, glmrob), Matias Salibian-Barrera [aut] (lmrob orig.), Tobias Verbeke [ctb, fnd] (mc, adjbox), Manuel Koller [aut] (mc, lmrob, psi-func.), Eduardo L. T. Conceicao [aut] (MM-, tau-, CM-, and MTL- nlrob), Maria Anna di Palma [ctb] (initial version of Comedian)
Initial release
2021-01-04

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.