IPos objects
The IPos class is a container for storing a set of integer positions. It exists in 2 flavors: UnstitchedIPos and StitchedIPos. Each flavor uses a particular internal representation:
In an UnstitchedIPos instance the positions are stored as an integer vector.
In a StitchedIPos instance the positions are stored as an IRanges object where each range represents a run of consecutive positions (i.e. a run of positions that are adjacent and in ascending order). This storage is particularly memory-efficient when the vector of positions contains long runs of consecutive positions.
Because integer positions can be seen as integer ranges of width 1, the IPos class extends the IntegerRanges virtual class.
IPos(pos=integer(0), names=NULL, ..., stitch=NA) # constructor function
pos |
An integer or numeric vector, or an IRanges object (or other
IntegerRanges derivative). If When |
names |
A character vector or |
... |
Metadata columns to set on the IPos object. All the metadata columns must be vector-like objects of the same length as the object to construct. |
stitch |
Controls which internal representation should be used: StitchedIPos
(when When |
OTOH the memory footprint of a StitchedIPos object can vary a lot but will never be worse than that of an IRanges object. However it will reduce dramatically if the vector of positions contains long runs of consecutive positions. In the worst case scenario (i.e. when the object contains no consecutive positions) its memory footprint will be the same as that of an IRanges object.
Like for any Vector derivative, the length of an
IPos object cannot exceed .Machine$integer.max
(i.e. 2^31 on
most platforms). IPos()
will return an error if pos
contains too many positions.
An UnstitchedIPos or StitchedIPos object. If the input object pos
is itself an IPos derivative, its metadata columns are propagated.
IPos objects support the same set of getters as other IntegerRanges
derivatives (i.e. length()
, start()
, end()
,
names()
, mcols()
, etc...), plus the pos()
getter
which is equivalent to start()
and end()
.
See ?IntegerRanges
for the list of getters supported by
IntegerRanges derivatives.
IPos derivatives support the names()
, mcols()
and
metadata()
setters only.
In particular there is no pos()
setter for IPos derivatives
at the moment (although one might be added in the future).
From UnstitchedIPos to StitchedIPos and vice-versa: coercion back and
forth between UnstitchedIPos and StitchedIPos is supported via
as(x, "StitchedIPos")
and as(x, "UnstitchedIPos")
.
This is the most efficient and recommended way to switch between the
2 internal representations. Note that this switch can have dramatic
consequences on memory usage so is for advanced users only.
End users should almost never need to do this switch when following
a typical workflow.
From IntegerRanges to UnstitchedIPos, StitchedIPos, or IPos:
An IntegerRanges derivative x
in which all the ranges have
a width of 1 can be coerced to an UnstitchedIPos or StitchedIPos object
with as(x, "UnstitchedIPos")
or as(x, "StitchedIPos")
,
respectively. For convenience as(x, "IPos")
is supported and is
equivalent to as(x, "UnstitchedIPos")
.
From IPos to IRanges:
An IPos derivative x
can be coerced to an IRanges object
with as(x, "IRanges")
. However be aware that if x
is a
StitchedIPos instance, the memory footprint of the resulting object
can be thousands times (or more) than that of x
!
See "MEMORY USAGE" in the Examples section below.
From IPos to ordinary R objects:
Like with any other IntegerRanges derivative, as.character()
,
as.factor()
, and as.data.frame()
work on an IPos derivative
x
. Note however that as.data.frame(x)
returns a data frame
with a pos
column (containing pos(x)
) instead of the
start
, end
, and width
columns that one gets with other
IntegerRanges derivatives.
An IPos derivative can be subsetted exactly like an IRanges object.
IPos derivatives can be concatenated with c()
or append()
.
See ?c
in the S4Vectors package for
more information about concatenating Vector derivatives.
Like with an IRanges object, split()
and relist()
work
on an IPos derivative.
Hervé Pagès; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de
The GPos class in the GenomicRanges package for representing a set of genomic positions (i.e. genomic ranges of width 1, a.k.a. genomic loci).
The IRanges class for storing a set of integer ranges of arbitrary width.
IPosRanges-comparison for comparing and ordering integer ranges and/or positions.
findOverlaps-methods for finding overlapping integer ranges and/or positions.
intra-range-methods and inter-range-methods for intra range and inter range transformations.
coverage-methods for computing the coverage of a set of ranges and/or positions.
nearest-methods for finding the nearest integer range/position neighbor.
showClass("IPos") # shows the known subclasses ## --------------------------------------------------------------------- ## BASIC EXAMPLES ## --------------------------------------------------------------------- ## Example 1: ipos1a <- IPos(c(44:53, 5:10, 2:5)) ipos1a # unstitched length(ipos1a) pos(ipos1a) # same as 'start(ipos1a)' and 'end(ipos1a)' as.character(ipos1a) as.data.frame(ipos1a) as(ipos1a, "IRanges") as.data.frame(as(ipos1a, "IRanges")) ipos1a[9:17] ipos1b <- IPos(c(44:53, 5:10, 2:5), stitch=TRUE) ipos1b # stitched ## 'ipos1a' and 'ipos1b' are semantically equivalent, only their ## internal representations differ: all(ipos1a == ipos1b) ipos1c <- IPos(c("44-53", "5-10", "2-5")) ipos1c # stitched identical(ipos1b, ipos1c) ## Example 2: my_pos <- IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20)) ipos2 <- IPos(my_pos) ipos2 # stitched ## Example 3: ipos3A <- ipos3B <- IPos(c("1-15000", "15400-88700")) npos <- length(ipos3A) mcols(ipos3A)$sample <- Rle("sA") sA_counts <- sample(10, npos, replace=TRUE) mcols(ipos3A)$counts <- sA_counts mcols(ipos3B)$sample <- Rle("sB") sB_counts <- sample(10, npos, replace=TRUE) mcols(ipos3B)$counts <- sB_counts ipos3 <- c(ipos3A, ipos3B) ipos3 ## --------------------------------------------------------------------- ## MEMORY USAGE ## --------------------------------------------------------------------- ## Coercion to IRanges works on a StitchedIPos object... ipos4 <- IPos(c("1-125000", "135000-575000")) ir4 <- as(ipos4, "IRanges") ir4 ## ... but is generally not a good idea: object.size(ipos4) object.size(ir4) # 1652 times bigger than the StitchedIPos object! ## Shuffling the order of the positions impacts memory usage: ipos4r <- rev(ipos4) object.size(ipos4r) ipos4s <- sample(ipos4) object.size(ipos4s) ## If one anticipates a lot of shuffling of the positions, ## then an UnstitchedIPos object should be used instead: ipos4b <- as(ipos4, "UnstitchedIPos") object.size(ipos4b) # initial size is bigger than stitched version object.size(rev(ipos4b)) # size didn't change object.size(sample(ipos4b)) # size didn't change ## AN IMPORTANT NOTE: In the worst situations, IPos still performs ## as good as an IRanges object. object.size(as(ipos4r, "IRanges")) # same size as 'ipos4r' object.size(as(ipos4s, "IRanges")) # same size as 'ipos4s' ## Best case scenario is when the object is strictly sorted (i.e. ## positions are in strict ascending order). ## This can be checked with: is.unsorted(ipos4, strict=TRUE) # 'ipos4' is strictly sorted ## --------------------------------------------------------------------- ## USING MEMORY-EFFICIENT METADATA COLUMNS ## --------------------------------------------------------------------- ## In order to keep memory usage as low as possible, it is recommended ## to use a memory-efficient representation of the metadata columns that ## we want to set on the object. Rle's are particularly well suited for ## this, especially if the metadata columns contain long runs of ## identical values. This is the case for example if we want to use an ## IPos object to represent the coverage of sequencing reads along a ## chromosome. ## Example 5: library(pasillaBamSubset) library(Rsamtools) # for the BamFile() constructor function bamfile1 <- BamFile(untreated1_chr4()) bamfile2 <- BamFile(untreated3_chr4()) ipos5 <- IPos(IRanges(1, seqlengths(bamfile1)[["chr4"]])) library(GenomicAlignments) # for "coverage" method for BamFile objects cvg1 <- coverage(bamfile1)$chr4 cvg2 <- coverage(bamfile2)$chr4 mcols(ipos5) <- DataFrame(cvg1, cvg2) ipos5 object.size(ipos5) # lightweight ## Keep only the positions where coverage is at least 10 in one of the ## 2 samples: ipos5[mcols(ipos5)$cvg1 >= 10 | mcols(ipos5)$cvg2 >= 10]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.