Lazy loading¶
When reading data from a data entry (using DBEntry.get, or DBEntry.get_slice), by default all data is read immediately from the
lowlevel Access Layer backend. This may take a long time to complete if the data entry
has a lot of data stored for the requested IDS.
Instead of reading data immediately, IMAS-Python can also lazy load the data when you need it. This will speed up your program in cases where you are interested in a subset of all the data stored in an IDS.
Enable lazy loading of data¶
You can enable lazy loading of data by supplying the keyword argument lazy=True
to DBEntry.get, or DBEntry.get_slice. The returned IDS
object will fetch the data from the backend at the moment that you want to access it.
See below example:
import os
import matplotlib
import numpy
# To avoid possible display issues when Matplotlib uses a non-GUI backend
if "DISPLAY" not in os.environ:
matplotlib.use("agg")
else:
matplotlib.use("TKagg")
from matplotlib import pyplot as plt
import imas
from imas.ids_defs import MDSPLUS_BACKEND
database, pulse, run, user = "ITER", 134173, 106, "public"
data_entry = imas.DBEntry(
MDSPLUS_BACKEND, database, pulse, run, user, data_version="3"
)
data_entry.open()
# Enable lazy loading with `lazy=True`:
core_profiles = data_entry.get("core_profiles", lazy=True)
# No data has been read from the lowlevel backend yet
# The time array is loaded only when we access it on the following lines:
time = core_profiles.time
print(f"Time has {len(time)} elements, between {time[0]} and {time[-1]}")
# Find the electron temperature at rho=0 for all time slices
electon_temperature_0 = numpy.array(
[p1d.electrons.temperature[0] for p1d in core_profiles.profiles_1d]
)
# Plot the figure
fig, ax = plt.subplots()
ax.plot(time, electon_temperature_0)
ax.set_ylabel("$T_e$")
ax.set_xlabel("$t$")
plt.show()
In this example, using lazy loading with the MDSPLUS backend is about 12 times
faster than a regular get(). When using the HDF5 backend, lazy loading
is about 300 times faster for this example.
Caveats of lazy loaded IDSs¶
Lazy loading of data may speed up your programs, but also comes with some limitations.
Some functionality is not implemented or works differently for lazy-loaded IDSs:
Iterating over non-empty nodes works differently, see API documentation:
imas.ids_structure.IDSStructure.iter_nonempty_().has_value()is not implemented for lazy-loaded structure elements.validate()will only validate loaded data. Additional data might be loaded from the backend to validate coordinate sizes.imas.util.print_tree()will only print data that is loaded whenhide_empty_nodesisTrue.-
When
visit_emptyisFalse(default), this method usesiter_nonempty_(). This raises an error for lazy-loaded IDSs, unless you setaccept_lazytoTrue.When
visit_emptyisTrue, this will iteratively load all data from the backend. This is effectively a full, but less efficient,get()/get_slice(). It will be faster if you don’t use lazy loading in this case.
IDS conversion through
imas.convert_idsis not implemented for lazy loaded IDSs. Note that Automatic conversion between DD versions also applies when lazy loading.Lazy loaded IDSs are read-only, setting or changing values, resizing arrays of structures, etc. is not allowed.
You cannot
put(),put_slice()orserialize()lazy-loaded IDSs.Copying lazy-loaded IDSs (through
copy.deepcopy()) is not implemented.
IMAS-Python assumes that the underlying data entry is not modified.
When you (or another user) overwrite or add data to the same data entry, you may end up with a mix of old and new data in the lazy loaded IDS.
After you close the data entry, no new elements can be loaded.
>>> core_profiles = data_entry.get("core_profiles", lazy=True) >>> data_entry.close() >>> print(core_profiles.time) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Cannot lazy load the requested data: the data entry is no longer available for reading. Hint: did you close() the DBEntry?Lazy loading has more overhead for reading data from the lowlevel: it is therefore more efficient to do a full
get()orget_slice()when you intend to use most of the data stored in an IDS.When using IMAS-Python with remote data access (i.e. the UDA backend), a full
get()orget_slice()may be more efficient than using lazy loading.It is recommended to add the parameter
;cache_mode=none[1] to the end of a UDA IMAS URI when using lazy loading: otherwise the UDA backend will still load the full IDS from the remote server.