Comparing and ordering ranges
Methods for comparing and/or ordering the ranges in IPosRanges derivatives (e.g. IRanges, IPos, or NCList objects).
## match() & selfmatch() ## --------------------- ## S4 method for signature 'IPosRanges,IPosRanges' match(x, table, nomatch=NA_integer_, incomparables=NULL, method=c("auto", "quick", "hash")) ## S4 method for signature 'IPosRanges' selfmatch(x, method=c("auto", "quick", "hash")) ## order() and related methods ## ---------------------------- ## S4 method for signature 'IPosRanges' is.unsorted(x, na.rm=FALSE, strictly=FALSE) ## S4 method for signature 'IPosRanges' order(..., na.last=TRUE, decreasing=FALSE, method=c("auto", "shell", "radix")) ## Generalized parallel comparison of 2 IPosRanges derivatives ## ----------------------------------------------------------- ## S4 method for signature 'IPosRanges,IPosRanges' pcompare(x, y) rangeComparisonCodeToLetter(code)
x, table, y |
IPosRanges derivatives e.g. IRanges, IPos, or NCList objects. |
nomatch |
The value to be returned in the case when no match is found.
It is coerced to an |
incomparables |
Not supported. |
method |
For For |
na.rm |
Ignored. |
strictly |
Logical indicating if the check should be for strictly increasing values. |
... |
One or more IPosRanges derivatives. The 2nd and following objects are used to break ties. |
na.last |
Ignored. |
decreasing |
|
code |
A vector of codes as returned by |
Two ranges of an IPosRanges derivative are considered equal iff
they share the same start and width.
duplicated()
and unique()
on an IPosRanges
derivative are conforming to this.
Note that with this definition, 2 empty ranges are generally not equal (they need to share the same start to be considered equal). This means that, when it comes to comparing ranges, an empty range is interpreted as a position between its end and start. For example, a typical usecase is comparison of insertion points defined along a string (like a DNA sequence) and represented as empty ranges.
The "natural order" for the elements of an IPosRanges derivative is to order them (a) first by start and (b) then by width. This way, the space of integer ranges is totally ordered.
pcompare()
, ==
, !=
, <=
, >=
, <
and >
on IPosRanges derivatives behave accordingly to
this "natural order".
is.unsorted()
, order()
, sort()
, rank()
on
IPosRanges derivatives also behave accordingly to this
"natural order".
Finally, note that some inter range transformations like
reduce
or disjoin
also use this "natural order"
implicitly when operating on IPosRanges derivatives.
pcompare(x, y)
:
Performs element-wise (aka "parallel") comparison of 2
IPosRanges objects of x
and y
, that is,
returns an integer vector where the i-th element is a code describing
how x[i]
is qualitatively positioned with respect to y[i]
.
Here is a summary of the 13 predefined codes (and their letter equivalents) and their meanings:
-6 a: x[i]: .oooo....... 6 m: x[i]: .......oooo. y[i]: .......oooo. y[i]: .oooo....... -5 b: x[i]: ..oooo...... 5 l: x[i]: ......oooo.. y[i]: ......oooo.. y[i]: ..oooo...... -4 c: x[i]: ...oooo..... 4 k: x[i]: .....oooo... y[i]: .....oooo... y[i]: ...oooo..... -3 d: x[i]: ...oooooo... 3 j: x[i]: .....oooo... y[i]: .....oooo... y[i]: ...oooooo... -2 e: x[i]: ..oooooooo.. 2 i: x[i]: ....oooo.... y[i]: ....oooo.... y[i]: ..oooooooo.. -1 f: x[i]: ...oooo..... 1 h: x[i]: ...oooooo... y[i]: ...oooooo... y[i]: ...oooo..... 0 g: x[i]: ...oooooo... y[i]: ...oooooo...
Note that this way of comparing ranges is a refinement over the
standard ranges comparison defined by the ==
, !=
,
<=
, >=
, <
and >
operators. In particular
a code that is < 0
, = 0
, or > 0
, corresponds to
x[i] < y[i]
, x[i] == y[i]
, or x[i] > y[i]
,
respectively.
The pcompare
method for IPosRanges derivatives is
guaranteed to return predefined codes only but methods for other
objects (e.g. for GenomicRanges objects) can
return non-predefined codes. Like for the predefined codes, the sign
of any non-predefined code must tell whether x[i]
is less than,
or greater than y[i]
.
rangeComparisonCodeToLetter(x)
:
Translate the codes returned by pcompare
. The 13 predefined
codes are translated as follow: -6 -> a; -5 -> b; -4 -> c; -3 -> d;
-2 -> e; -1 -> f; 0 -> g; 1 -> h; 2 -> i; 3 -> j; 4 -> k; 5-> l; 6 -> m.
Any non-predefined code is translated to X.
The translated codes are returned in a factor with 14 levels:
a, b, ..., l, m, X.
match(x, table, nomatch=NA_integer_, method=c("auto", "quick", "hash"))
:
Returns an integer vector of the length of x
,
containing the index of the first matching range in table
(or nomatch
if there is no matching range) for each range
in x
.
selfmatch(x, method=c("auto", "quick", "hash"))
:
Equivalent to, but more efficient than,
match(x, x, method=method)
.
duplicated(x, fromLast=FALSE, method=c("auto", "quick", "hash"))
:
Determines which elements of x
are equal to elements
with smaller subscripts, and returns a logical vector indicating
which elements are duplicates. duplicated(x)
is equivalent to,
but more efficient than, duplicated(as.data.frame(x))
on an
IPosRanges derivative.
See duplicated
in the base package for more
details.
unique(x, fromLast=FALSE, method=c("auto", "quick", "hash"))
:
Removes duplicate ranges from x
. unique(x)
is equivalent
to, but more efficient than, unique(as.data.frame(x))
on an
IPosRanges derivative.
See unique
in the base package for more
details.
x %in% table
:
A shortcut for finding the ranges in x
that match any of
the ranges in table
. Returns a logical vector of length
equal to the number of ranges in x
.
findMatches(x, table, method=c("auto", "quick", "hash"))
:
An enhanced version of match
that returns all the matches
in a Hits object.
countMatches(x, table, method=c("auto", "quick", "hash"))
:
Returns an integer vector of the length of x
containing the
number of matches in table
for each element in x
.
order(...)
:
Returns a permutation which rearranges its first argument (an
IPosRanges derivative) into ascending order, breaking ties
by further arguments (also IPosRanges derivatives).
sort(x)
:
Sorts x
.
See sort
in the base package for more details.
rank(x, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min"))
:
Returns the sample ranks of the ranges in x
.
See rank
in the base package for more details.
Hervé Pagès
The IPosRanges class.
Vector-comparison in the S4Vectors package for general information about comparing, ordering, and tabulating vector-like objects.
GenomicRanges-comparison in the GenomicRanges package for comparing and ordering genomic ranges.
findOverlaps
for finding overlapping ranges.
intra-range-methods and inter-range-methods for intra range and inter range transformations.
setops-methods for set operations on IRanges objects.
## --------------------------------------------------------------------- ## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 IPosRanges ## DERIVATIVES ## --------------------------------------------------------------------- x0 <- IRanges(1:11, width=4) x0 y0 <- IRanges(6, 9) pcompare(x0, y0) pcompare(IRanges(4:6, width=6), y0) pcompare(IRanges(6:8, width=2), y0) pcompare(x0, y0) < 0 # equivalent to 'x0 < y0' pcompare(x0, y0) == 0 # equivalent to 'x0 == y0' pcompare(x0, y0) > 0 # equivalent to 'x0 > y0' rangeComparisonCodeToLetter(-10:10) rangeComparisonCodeToLetter(pcompare(x0, y0)) ## Handling of zero-width ranges (a.k.a. empty ranges): x1 <- IRanges(11:17, width=0) x1 pcompare(x1, x1[4]) pcompare(x1, IRanges(12, 15)) ## Note that x1[2] and x1[6] are empty ranges on the edge of non-empty ## range IRanges(12, 15). Even though -1 and 3 could also be considered ## valid codes for describing these configurations, pcompare() ## considers x1[2] and x1[6] to be *adjacent* to IRanges(12, 15), and ## thus returns codes -5 and 5: pcompare(x1[2], IRanges(12, 15)) # -5 pcompare(x1[6], IRanges(12, 15)) # 5 x2 <- IRanges(start=c(20L, 8L, 20L, 22L, 25L, 20L, 22L, 22L), width=c( 4L, 0L, 11L, 5L, 0L, 9L, 5L, 0L)) x2 which(width(x2) == 0) # 3 empty ranges x2[2] == x2[2] # TRUE x2[2] == x2[5] # FALSE x2 == x2[4] x2 >= x2[3] ## --------------------------------------------------------------------- ## B. match(), selfmatch(), %in%, duplicated(), unique() ## --------------------------------------------------------------------- table <- x2[c(2:4, 7:8)] match(x2, table) x2 %in% table duplicated(x2) unique(x2) ## --------------------------------------------------------------------- ## C. findMatches(), countMatches() ## --------------------------------------------------------------------- findMatches(x2, table) countMatches(x2, table) x2_levels <- unique(x2) countMatches(x2_levels, x2) ## --------------------------------------------------------------------- ## D. order() AND RELATED METHODS ## --------------------------------------------------------------------- is.unsorted(x2) order(x2) sort(x2) rank(x2, ties.method="first")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.