herr_t - Indico - Paul Scherrer Institut

Partial Edge Chunks
Dana Robinson
The HDF Group
Efficient Use of HDF5 With High Data Rate X-Ray Detectors
Paul Scherrer Institut
May
May
30-31,
30-31
2012
HDF5 Workshop at PSI
Consider the following problem…
DATASET
Consider an extensible,
filtered (compressed, etc.)
dataset…
Dataset Elements
May 30-31, 2012
HDF5 Workshop at PSI
Consider the following problem…
DATASET
Which is chunked…
Chunks
May 30-31, 2012
HDF5 Workshop at PSI
Consider the following problem…
DATASET
Which will be
1) opened
2) extended
3) closed
repeatedly.
May 30-31, 2012
HDF5 Workshop at PSI
Problem: The chunk is filtered twice per extension
DATASET
1) Read,
Uncompress
2) Extend
3) Compress,
Write
May 30-31, 2012
HDF5 Workshop at PSI
Also, as the filtered/compressed size of the chunk
changes, it will be relocated in the file.
t0
HDF5 File
p0
t1
HDF5 File
p0
p1
p0
p1
t2
HDF5 File
holes
May 30-31, 2012
HDF5 Workshop at PSI
p2
Solution: Do not filter partial edge chunks
DATASET
C }
Compressed
U }
Uncompressed
When a chunk fills, it will
automatically be compressed if
filters are enabled.
May 30-31, 2012
HDF5 Workshop at PSI
Solution: Do not filter partial edge chunks
DATASET
C }
Compressed
U }
Uncompressed
+ Partial chunks are always the
same size on the disk and do
not move until full.
+ Less fragmentation.
+ No compression overhead on
partial chunk I/O.
- Possible size penalty for
uncompressed edge data.
May 30-31, 2012
HDF5 Workshop at PSI
Also consider the following problem…
DATASET
Consider a dataset…
May 30-31, 2012
HDF5 Workshop at PSI
Also consider the following problem…
DATASET
Which is chunked…
May 30-31, 2012
HDF5 Workshop at PSI
Also consider the following problem…
DATASET
Which is chunked…
This space is empty and wasted
May 30-31, 2012
HDF5 Workshop at PSI
Also consider the following problem…
DATASET
This space is allocated and
exists on the disk.
Compression can reduce but
not eliminate the wasted
space.
Compression can have a
performance penalty.
Parallel HDF5 cannot use
compression.
May 30-31, 2012
HDF5 Workshop at PSI
Solution: Do not store the empty space
DATASET
This space is NOT stored
May 30-31, 2012
HDF5 Workshop at PSI
Solution: Do not store the empty space
DATASET
+ Saves space
- Can result in file fragmentation
if the dataset is later extended.
This space is NOT stored
May 30-31, 2012
HDF5 Workshop at PSI
Two New API Functions
herr_t
H5Pset_edge_chunk_opts(hid_t dcpl_id, unsigned opts);
herr_t
H5Pget_edge_chunk_opts(hid_t dcpl_id, unsigned *opts);
OPTIONS
H5D_STORE_PARTIAL_CHUNKS
H5D_DONT_FILTER_PARTIAL_CHUNKS
May 30-31, 2012
HDF5 Workshop at PSI
(default = disabled)
(default = disabled)
Final Notes
• This work requires a file format change, so it cannot appear in
HDF5 1.8.x.
• Older versions of the library will not understand either of
these options.
• Enabling/disabling filters on edge chunks should appear in
HDF5 1.10.0.
• Partial storage of partial edge chunks has not been
implemented and is currently unfunded.
May 30-31, 2012
HDF5 Workshop at PSI