File Objects¶
Opening & creating files¶
HDF5 files work generally like standard Python file objects. They support standard modes like r/w/a, and should be closed when they are no longer in use. However, there is obviously no concept of “text” vs “binary” mode.
>>> f = h5py.File('myfile.hdf5','r')
The file name may be a byte string or unicode string. Valid modes are:
r
Readonly, file must exist
r+
Read/write, file must exist
w
Create file, truncate if exists
w-
Create file, fail if exists
a
Read/write if exists, create otherwise (default)
File drivers¶
HDF5 ships with a variety of different low-level drivers, which map the logical HDF5 address space to different storage mechanisms. You can specify which driver you want to use when the file is opened:
>>> f = h5py.File('myfile.hdf5', driver=<driver name>, <driver_kwds>)
For example, the HDF5 “core” driver can be used to create a purely in-memory HDF5 file, optionally written out to disk when it is closed. Here’s a list of supported drivers and their options:
- None
Strongly recommended. Use the standard HDF5 driver appropriate for the current platform. On UNIX, this is the H5FD_SEC2 driver; on Windows, it is H5FD_WINDOWS.
- ‘sec2’
Unbuffered, optimized I/O using standard POSIX functions.
- ‘stdio’
Buffered I/O using functions from stdio.h.
- ‘core’
Memory-map the entire file; all operations are performed in memory and written back out when the file is closed. Keywords:
- backing_store: If True (default), save changes to a real file
when closing. If False, the file exists purely in memory and is discarded when closed.
- block_size: Increment (in bytes) by which memory is extended.
Default is 64k.
- ‘family’
Store the file on disk as a series of fixed-length chunks. Useful if the file system doesn’t allow large files. Note: the filename you provide must contain a printf-style integer format code (e.g. %d”), which will be replaced by the file sequence number. Keywords:
memb_size: Maximum file size (default is 2**31-1).
Version Bounding¶
HDF5 has been evolving for many years now. By default, the library will write objects in the most compatible fashion possible, so that older versions will still be able to read files generated by modern programs. However, there can be performance advantages if you are willing to forgo a certain level of backwards compatibility. By using the “libver” option to File, you can specify the minimum and maximum sophistication of these structures:
>>> f = h5py.File('name.hdf5', libver='earliest') # most compatible
>>> f = h5py.File('name.hdf5', libver='latest') # most modern
Here “latest” means that HDF5 will always use the newest version of these structures without particular concern for backwards compatibility. The “earliest” option means that HDF5 will make a best effort to be backwards compatible.
The default is “earliest”.
User block¶
HDF5 allows the user to insert arbitrary data at the beginning of the file,
in a reserved space called the user block. The length of the user block
must be specified when the file is created. It can be either zero
(the default) or a power of two greater than or equal to 512. You
can specify the size of the user block when creating a new file, via the
userblock_size keyword to File; the userblock size of an open file can
likewise be queried through the File.userblock_size property.
Modifying the user block on an open file is not supported; this is a limitation of the HDF5 library. However, once the file is closed you are free to read and write data at the start of the file, provided your modifications don’t leave the user block region.
Reference¶
In addition to the properties and methods defined here, File objects inherit the full API of Group objects; in this case, the group in question is the root group (/) of the file.
Note
Please note that unlike Python file objects, the attribute File.name
does not refer to the file name on disk. File.name gives the HDF5
name of the root group, “/”. To access the on-disk name, use
File.filename.
-
class
h5py.File(name, mode=None, driver=None, libver=None, userblock_size=None, swmr=False, rdcc_nslots=None, rdcc_nbytes=None, rdcc_w0=None, track_order=None, **kwds)¶ Represents an HDF5 file.
-
__init__(name, mode=None, driver=None, libver=None, userblock_size=None, swmr=False, rdcc_nslots=None, rdcc_nbytes=None, rdcc_w0=None, track_order=None, **kwds)¶ Create a new file object.
See the h5py user guide for a detailed explanation of the options.
- name
Name of the file on disk, or file-like object. Note: for files created with the ‘core’ driver, HDF5 still requires this be non-empty.
- mode
r Readonly, file must exist r+ Read/write, file must exist w Create file, truncate if exists w- or x Create file, fail if exists a Read/write if exists, create otherwise (default)
- driver
Name of the driver to use. Legal values are None (default, recommended), ‘core’, ‘sec2’, ‘stdio’, ‘mpio’.
- libver
Library version bounds. Supported values: ‘earliest’, ‘v108’, ‘v110’, and ‘latest’. The ‘v108’ and ‘v110’ options can only be specified with the HDF5 1.10.2 library or later.
- userblock
Desired size of user block. Only allowed when creating a new file (mode w, w- or x).
- swmr
Open the file in
SWMRread mode. Only used when mode = ‘r’.- rdcc_nbytes
Total size of the raw data chunk cache in bytes. The default size is 1024**2 (1
MB) per dataset.- rdcc_w0
The chunk preemption policy for all datasets. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly
LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower depending on how often you re-read or re-write the same data. The default value is 0.75.- rdcc_nslots
The number of chunk slots in the raw data chunk cache for this file. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521.
- track_order
Track dataset/group/attribute creation order under root group if True. If None use global default h5.get_config().track_order.
- Additional keywords
Passed on to the selected file driver.
File properties
-
filename¶ File name on disk
-
mode¶ Python mode used to open file
-
driver¶ Low-level HDF5 file driver used to open file
-
libver¶ File format version bounds (2-tuple: low, high)
-
userblock_size¶ User block size (in bytes)
File methods
-
close()¶ Close the file. All open objects become invalid
-
flush()¶ Tell the HDF5 library to flush its buffers.
Properties common to all HDF5 objects:
-
file¶ Return a File instance associated with this object
-
parent¶ Return the parent group of this object.
This is always equivalent to obj.file[posixpath.dirname(obj.name)]. ValueError if this object is anonymous.
-
name¶ Return the full name of this object. None if anonymous.
-
id¶ Low-level identifier appropriate for this object
-
ref¶ An (opaque) HDF5 reference to this object
-
attrs¶ Attributes attached to this object
-