HDF5 dump management
A set of utilities to control the location and physical properties of automatically created HDF5 datasets.
setHDF5DumpDir(dir) setHDF5DumpFile(filepath) setHDF5DumpName(name) setHDF5DumpChunkLength(length=1000000L) setHDF5DumpChunkShape(shape="scale") setHDF5DumpCompressionLevel(level=6L) getHDF5DumpDir() getHDF5DumpFile(for.use=FALSE) getHDF5DumpName(for.use=FALSE) getHDF5DumpChunkLength() getHDF5DumpChunkShape() getHDF5DumpCompressionLevel() lsHDF5DumpFile() showHDF5DumpLog() ## For developers: getHDF5DumpChunkDim(dim) appendDatasetCreationToHDF5DumpLog(filepath, name, dim, type, chunkdim, level)
dir |
The path (as a single string) to the current HDF5 dump directory,
that is, to the (new or existing) directory where HDF5 dump files
with automatic names will be created. This is ignored if the user
specified an HDF5 dump file with |
filepath |
For For |
name |
For For |
length |
The maximum length of the physical chunks of the next automatic HDF5 dataset to be written to the current HDF5 dump file. |
shape |
A string specifying the shape of the physical chunks of the next
automatic HDF5 dataset to be written to the current HDF5 dump file.
See |
level |
For For |
for.use |
Whether the returned file or dataset name is for use by the caller or not. See below for the details. |
dim |
The dimensions of the HDF5 dataset to be written to disk, that is,
an integer vector of length one or more giving the maximal indices in each
dimension.
See the |
type |
The type (a.k.a. storage mode) of the data to be written to disk. Can be
obtained with |
chunkdim |
The dimensions of the chunks. |
Calling getHDF5DumpFile()
and getHDF5DumpName()
with no
argument should be informative only i.e. it's a mean for the user
to know where the next automatic HDF5 dataset will be written.
Since a given file/name combination can be used only once, the user should
be careful to not use that combination to explicitely create an HDF5 dataset
because that would get in the way of the creation of the next
automatic HDF5 dataset.
See the Note TO DEVELOPERS below if you actually need to use this file/name
combination.
lsHDF5DumpFile()
is a just convenience wrapper for
rhdf5::h5ls(getHDF5DumpFile())
.
getHDF5DumpDir
returns the absolute path to the directory
where HDF5 dump files with automatic names will be created.
Only meaningful if the user did NOT specify an HDF5 dump file
with setHDF5DumpFile
.
getHDF5DumpFile
returns the absolute path to the HDF5 file where
the next automatic HDF5 dataset will be written.
getHDF5DumpName
returns the name of the next automatic HDF5
dataset.
getHDF5DumpCompressionLevel
returns the compression level currently
used for writing automatic HDF5 datasets to disk.
showHDF5DumpLog
returns the dump log in an invisible data frame.
getHDF5DumpChunkDim
returns the dimensions of the physical chunks
that will be used to write the dataset to disk.
TO DEVELOPERS:
If your application needs to write its own dataset to the HDF5 dump then it should:
Get a file/name combination by calling
getHDF5DumpFile(for.use=TRUE)
and
getHDF5DumpName(for.use=TRUE)
.
[OPTIONAL] Call getHDF5DumpChunkDim(dim)
to get reasonable
chunk dimensions to use for writing the dataset to disk. Or choose
your own chunk dimensions.
Add an entry to the dump log by calling
appendDatasetCreationToHDF5DumpLog
.
Typically, this should be done right after creating the dataset
(e.g. with rhdf5::h5createDataset
) and before starting to
write the dataset to disk. The values passed to
appendDatasetCreationToHDF5DumpLog
via the filepath
,
name
, dim
, type
, chunkdim
, and
level
arguments should be those that were passed to
rhdf5::h5createDataset
via the file
, dataset
,
dims
, storage.mode
, chunk
, and level
arguments, respectively.
Note that appendDatasetCreationToHDF5DumpLog
uses a lock
mechanism so is safe to use in the context of parallel execution.
This is actually what the coercion method to HDF5Array does internally.
writeHDF5Array
for writing an array-like object
to an HDF5 file.
HDF5Array objects.
The h5ls
function in the rhdf5 package,
on which lsHDF5DumpFile
is based.
makeCappedVolumeBox
in the
DelayedArray package.
type
in the DelayedArray package.
getHDF5DumpDir() getHDF5DumpFile() ## Use setHDF5DumpFile() to change the current HDF5 dump file. ## If the specified file exists, then it must be in HDF5 format or ## an error will be raised. If it doesn't exist, then it will be ## created. #setHDF5DumpFile("path/to/some/HDF5/file") lsHDF5DumpFile() a <- array(1:600, c(150, 4)) A <- as(a, "HDF5Array") lsHDF5DumpFile() A b <- array(runif(6000), c(4, 2, 150)) B <- as(b, "HDF5Array") lsHDF5DumpFile() B C <- (log(2 * A + 0.88) - 5)^3 * t(B[ , 1, ]) as(C, "HDF5Array") # realize C on disk lsHDF5DumpFile() ## Matrix multiplication is not delayed: the output matrix is realized ## block by block. The current "realization backend" controls where ## realization happens e.g. in memory if set to NULL or in an HDF5 file ## if set to "HDF5Array". See '?realize' in the DelayedArray package for ## more information about "realization backends". setAutoRealizationBackend("HDF5Array") m <- matrix(runif(20), nrow=4) P <- C %*% m lsHDF5DumpFile() ## See all the HDF5 datasets created in the current session so far: showHDF5DumpLog() ## Wrap the call in suppressMessages() if you are only interested in the ## data frame version of the dump log: dump_log <- suppressMessages(showHDF5DumpLog()) dump_log
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.