Fast Sample Ranks
The function base::rank
has various weaknesses. Apart from the fact that it is not very fast, the option to calculate dense ranks is not implemented. Then, an argument for specifying the ranking direction is missing (assuming that this can be done with the ranking of the negative variables) and finally, multiple columns cannot be used in the case of ties for further ranking.
The function data.table::frankv
provides a more elaborated interface and convinces by very performant calculations and is much faster than the original.
It further accepts vectors, lists, data.frame
s or data.table
s as input. In addition to the ties.method
possibilities provided by base::rank
, it also provides ties.method="dense"
.
The present function Rank
is merely a somewhat customized parameterization of the data.table
function.
Rank(..., decreasing = FALSE, na.last = TRUE, ties.method = c("average", "first", "last", "random", "max", "min", "dense"))
... |
A vector, or list with all its elements identical in length or |
decreasing |
An |
na.last |
Control treatment of |
ties.method |
A character string specifying how ties are treated, see |
To be consistent with other data.table
operations, NA
s are considered identical to other NA
s (and NaN
s to other NaN
s), unlike base::rank
. Therefore, for na.last=TRUE
and na.last=FALSE
, NA
s (and NaN
s) are given identical ranks, unlike rank
.
Rank
is not limited to vectors. It accepts data.table
s (and list
s and data.frame
s) as well. It accepts unquoted column names (with names preceded with a -
sign for descending order, even on character vectors), for e.g., Rank(DT, a, -b, c, ties.method="first")
where a,b,c
are columns in DT
.
In addition to the ties.method
values possible using base's rank
, it also provides another additional argument "dense"
which returns the ranks without any gaps in the ranking. See examples.
Like forder
, sorting is done in "C-locale"; in particular, this may affect how capital/lowercase letters are ranked. See Details on forder
for more.
bit64::integer64
type is also supported.
A numeric vector of length equal to NROW(x)
(unless na.last = NA
, when missing values are removed). The vector is of integer type unless ties.method = "average"
when it is of double type (irrespective of ties).
# on vectors x <- c(4, 1, 4, NA, 1, NA, 4) # NAs are considered identical (unlike base R) # default is average Rank(x) # na.last=TRUE Rank(x, na.last=FALSE) # ties.method = min Rank(x, ties.method="min") # ties.method = dense Rank(x, ties.method="dense") # on data.frame, using both columns d.set <- data.frame(x, y=c(1, 1, 1, 0, NA, 0, 2)) Rank(d.set, na.last="keep") Rank(d.set, ties.method="dense", na.last=NA) # decreasing argument Rank(d.set, decreasing=c(FALSE, TRUE), ties.method="first")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.