Weighted Median Value
Computes a weighted median of a numeric vector.
weightedMedian(x, w = NULL, idxs = NULL, na.rm = FALSE, interpolate = is.null(ties), ties = NULL, ...)
x |
a |
w |
a vector of weights the same length as |
idxs |
A |
na.rm |
a logical value indicating whether |
interpolate |
If |
ties |
If |
... |
Not used. |
Returns a numeric
scalar.
For the n
elements x = c(x[1], x[2], ..., x[n])
with positive
weights w = c(w[1], w[2], ..., w[n])
such that sum(w) = S
, the
weighted median is defined as the element x[k]
for which the
total weight of all elements x[i] < x[k]
is less or equal to
S/2
and for which the total weight of all elements x[i] > x[k]
is less or equal to S/2
(c.f. [1]).
When using linear interpolation, the weighted mean of x[k-1]
and
x[k]
with weights S[k-1]
and S[k]
corresponding to the
cumulative weights of those two elements is used as an estimate.
If w
is missing then all elements of x
are given the same
positive weight. If all weights are zero, NA_real_
is
returned.
If one or more weights are Inf
, it is the same as these weights have
the same weight and the others have zero. This makes things easier for cases
where the weights are result of a division with zero.
If there are missing values in w
that are part of the calculation
(after subsetting and dropping missing values in x
), then the final
result is always NA
of the same type as x
.
The weighted median solves the following optimization problem:
α^* = \arg_α \min ∑_{i = 1}^{n} w_i |x_i-α|
where x = (x_1, x_2, …, x_n) are scalars and w = (w_1, w_2, …, w_n) are the corresponding "weights" for each individual x value.
Henrik Bengtsson and Ola Hossjer, Centre for Mathematical Sciences, Lund University. Thanks to Roger Koenker, Econometrics, University of Illinois, for the initial ideas.
[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, The MIT Press, Massachusetts Institute of Technology, 1989.
median
, mean
() and
weightedMean
().
x <- 1:10 n <- length(x) m1 <- median(x) # 5.5 m2 <- weightedMedian(x) # 5.5 stopifnot(identical(m1, m2)) w <- rep(1, times = n) m1 <- weightedMedian(x, w) # 5.5 (default) m2 <- weightedMedian(x, ties = "weighted") # 5.5 (default) m3 <- weightedMedian(x, ties = "min") # 5 m4 <- weightedMedian(x, ties = "max") # 6 stopifnot(identical(m1, m2)) # Pull the median towards zero w[1] <- 5 m1 <- weightedMedian(x, w) # 3.5 y <- c(rep(0, times = w[1]), x[-1]) # Only possible for integer weights m2 <- median(y) # 3.5 stopifnot(identical(m1, m2)) # Put even more weight on the zero w[1] <- 8.5 weightedMedian(x, w) # 2 # All weight on the first value w[1] <- Inf weightedMedian(x, w) # 1 # All weight on the last value w[1] <- 1 w[n] <- Inf weightedMedian(x, w) # 10 # All weights set to zero w <- rep(0, times = n) weightedMedian(x, w) # NA # Simple benchmarking bench <- function(N = 1e5, K = 10) { x <- rnorm(N) gc() t <- c() t[1] <- system.time(for (k in 1:K) median(x))[3] t[2] <- system.time(for (k in 1:K) weightedMedian(x))[3] t <- t / t[1] names(t) <- c("median", "weightedMedian") t } print(bench(N = 5, K = 100)) print(bench(N = 50, K = 100)) print(bench(N = 200, K = 100)) print(bench(N = 1000, K = 100)) print(bench(N = 10e3, K = 20)) print(bench(N = 100e3, K = 20))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.