Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

h5createDataset

Create HDF5 dataset


Description

R function to create an HDF5 dataset and defining its dimensionality and compression behaviour.

Usage

h5createDataset (file, dataset, 
		dims, maxdims = dims, 
		storage.mode = "double", H5type = NULL, 
		size = NULL, chunk = dims, fillValue, 
		level = 6, filter = "gzip", shuffle = TRUE,
        native = FALSE)

Arguments

file

The filename (character) of the file in which the dataset will be located. For advanced programmers it is possible to provide an object of class H5IdComponent representing a H5 location identifier (file or group). See H5Fcreate, H5Fopen, H5Gcreate, H5Gopen to create an object of this kind.

dataset

Name of the dataset to be created. The name can contain group names, e.g. 'group/dataset', but the function will fail, if the group does not yet exist.

dims

The dimensions of the array as they will appear in the file. Note, the dimensions will appear in inverted order when viewing the file with a C-programm (e.g. HDFView), because the fastest changing dimension in R is the first one, whereas the fastest changing dimension in C is the last one.

maxdims

The maximum extension of the array. Use H5Sunlimited() to indicate an extensible dimension.

storage.mode

The storage mode of the data to be written. Can be obtained by storage.mode(mydata).

H5type

Advanced programmers can specify the datatype of the dataset within the file. See h5const("H5T") for a list of available datatypes. If H5type is specified the argument storage.mode is ignored. It is recommended to use storage.mode

size

For storage.mode='character' the maximum string length has to be specified. rhdf5 writes null padded strings by dafault, thus the value provided here should be the length of the longest string. HDF5 then stores the string as fixed length character vectors. Together with compression, this should be efficient.

chunk

The chunk size used to store the dataset. It is an integer vector of the same length as dims. This argument is usually set together with a compression property (argument level).

fillValue

Standard value for filling the dataset. The storage.mode of value has to be convertable to the dataset type by HDF5.

level

The compression level used. An integer value between 0 (no compression) and 9 (highest and slowest compression).

filter

Character defining which compression filter should be applied to the chunks of the dataset. See the Details section for more information on the options that can be provided here.

shuffle

Logical defining whether the byte-shuffle algorithm should be applied to data prior to compression.

native

An object of class logical. If TRUE, array-like objects are treated as stored in HDF5 row-major rather than R column-major orientation. Using native = TRUE increases HDF5 file portability between programming languages. A file written with native = TRUE should also be read with native = TRUE

.

Details

Creates a new dataset in an existing HDF5 file. The function will fail if the file doesn't exist or if there exists already another dataset with the same name within the specified file.

The filter argument can take several options matching to compression filters distributed in either with the HDF5 library in Rhdf5lib or via the rhdf5filters package. The plugins available and the corresponding values for selecting them are shown below:

zlib: Ubiquitous deflate compression algrithm used in GZIP or ZIP files. All three options below achieve the same result.
  • "GZIP",

  • "ZLIB",

  • "DEFLATE"

szip: Compression algorithm maintained by the HDF5 group.
  • "SZIP"

bzip2
  • "BZIP2"

BLOSC meta compressor: As a meta-compressor BLOSC wraps several different compression algorithms. Each of the options below will active a different compression filter.
  • "BLOSC_BLOSCLZ"

  • "BLOSC_LZ4"

  • "BLOSC_LZ4HC"

  • "BLOSC_SNAPPY"

  • "BLOSC_ZLIB"

  • "BLOSC_ZSTD"

Disable: It is possible to write chunks without and compression applied.
  • "NONE"

Value

Returns TRUE is dataset was created successfully and FALSE otherwise.

Author(s)

Bernd Fischer, Mike L. Smith

References

See Also

Examples

h5createFile("ex_createDataset.h5")

# create dataset with compression
h5createDataset("ex_createDataset.h5", "A", c(5,8), storage.mode = "integer", chunk=c(5,1), level=6)

# create dataset without compression
h5createDataset("ex_createDataset.h5", "B", c(5,8), storage.mode = "integer")
h5createDataset("ex_createDataset.h5", "C", c(5,8), storage.mode = "double")

# create a dataset of strings & define size based on longest string
ex_strings <- c('long', 'longer', 'longest')
h5createDataset("ex_createDataset.h5", "D",  
    storage.mode = "character", chunk = 3, level = 6,
    dims = length(ex_strings), size = max(nchar(ex_strings)))


# write data to dataset
h5write(matrix(1:40,nr=5,nc=8), file="ex_createDataset.h5", name="A")
# write second column
h5write(matrix(1:5,nr=5,nc=1), file="ex_createDataset.h5", name="B", index=list(NULL,2))
# write character vector
h5write(ex_strings, file = "ex_createDataset.h5", name = "D")

h5dump("ex_createDataset.h5")

rhdf5

R Interface to HDF5

v2.34.0
Artistic-2.0
Authors
Bernd Fischer [aut], Mike Smith [aut, cre] (<https://orcid.org/0000-0002-7800-3848>), Gregoire Pau [aut], Martin Morgan [ctb], Daniel van Twisk [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.