Working with multiple data dictionary versions

Contrary to most high level interface for IMAS, IMAS-Python code is not tied to a specific version of the Data Dictionary. In this lesson we will explore how IMAS-Python handles different DD versions (including development builds of the DD), and how we can convert IDSs between different versions of the Data Dictionary.

Note

Most of the time you won’t need to worry about DD versions and the default IMAS-Python behaviour should be fine.

The default Data Dictionary version

In the other training lessons, we didn’t explicitly work with Data Dictionary versions. Therefore IMAS-Python was always using the default DD version. Let’s find out what that version is:

Exercise 1: The default DD version

  1. Create an imas.IDSFactory().

  2. Print the version of the DD that is used.

  3. Create an empty IDS with this IDSFactory (any IDS is fine) and print the DD version of the IDS, see get_data_dictionary_version(). What do you notice?

  4. Create an imas.DBEntry, you may use the MEMORY_BACKEND. Print the DD version that is used. What do you notice?

import imas
from imas.util import get_data_dictionary_version

# 1. Create an IDSFactory
default_factory = imas.IDSFactory()

# 2. Print the DD version used by the IDSFactory
#
# This factory will use the default DD version, because we didn't explicitly indicate
# which version of the DD we want to use:
print("Default DD version:", default_factory.version)

# 3. Create an empty IDS
pf_active = default_factory.new("pf_active")
print("DD version used for pf_active:", get_data_dictionary_version(pf_active))
# What do you notice? This is the same version as the IDSFactory that was used to create
# it.

# 4. Create a new DBEntry
default_entry = imas.DBEntry(imas.ids_defs.MEMORY_BACKEND, "test", 0, 0)
default_entry.create()
# Alternative URI syntax when using AL5.0.0:
# default_entry = imas.DBEntry("imas:memory?path=.")
print("DD version used for the DBEntry:", get_data_dictionary_version(default_entry))
# What do you notice? It is the same default version again.

Okay, so now you know what your default DD version is. But how is it determined? IMAS-Python first checks if you have an IMAS environment loaded by checking the environment variable IMAS_VERSION. If you are on a cluster and have used module load IMAS or similar, this environment variable will indicate what data dictionary version this module is using. IMAS-Python will use that version as its default.

If the IMAS_VERSION environment is not set, IMAS-Python will take the newest version of the Data Dictionary that came bundled with it. Which brings us to the following topic:

Bundled Data Dictionary definitions

IMAS-Python comes bundled [1] with many versions of the Data Dictionary definitions. You can find out which versions are available by calling imas.dd_zip.dd_xml_versions.

Converting an IDS between Data Dictionary versions

Newer versions of the Data Dictionary may introduce changes in IDS definitions. Some things that could change:

  • Introduce a new IDS node

  • Remove an IDS node

  • Change the data type of an IDS node

  • Rename an IDS node

IMAS-Python can convert between different versions of the DD and will migrate the data as much as possible. Let’s see how this works in the following exercise.

Exercise 2: Convert an IDS between DD versions

In this exercise we will work with a really old version of the data dictionary for the pulse_schedule IDS because a number of IDS nodes were renamed for this IDS.

  1. Create an imas.IDSFactory() for DD version 3.25.0.

  2. Create a pulse_schedule IDS with this IDSFactory and verify that it is using DD version 3.25.0.

  3. Fill the IDS with some test data:

    pulse_schedule.ids_properties.homogeneous_time = \
        imas.ids_defs.IDS_TIME_MODE_HOMOGENEOUS
    pulse_schedule.ids_properties.comment = \
        "Testing renamed IDS nodes with IMAS-Python"
    pulse_schedule.time = [1., 1.1, 1.2]
    
    pulse_schedule.ec.antenna.resize(1)
    antenna = pulse_schedule.ec.antenna[0]
    antenna.name = "ec.antenna[0].name in DD 3.25.0"
    antenna.launching_angle_pol.reference_name = \
        "ec.antenna[0].launching_angle_pol.reference_name in DD 3.25.0"
    antenna.launching_angle_pol.reference.data = [2.1, 2.2, 2.3]
    antenna.launching_angle_tor.reference_name = \
        "ec.antenna[0].launching_angle_tor.reference_name in DD 3.25.0"
    antenna.launching_angle_tor.reference.data = [3.1, 3.2, 3.3]
    
    
  4. Use imas.convert_ids to convert the IDS to DD version 3.39.0. The antenna structure that we filled in the old version of the DD has since been renamed to launcher, and the launching_angle_* structures to steering_angle. Check that IMAS-Python has converted the data successfully (for example with imas.util.print_tree()).

  5. By default, IMAS-Python creates a shallow copy of the data, which means that the underlying data arrays are shared between the IDSs of both versions. Update the time data of the original IDS (for example: pulse_schedule.time[1] = 3) and print the time data of the converted IDS. Are they the same?

    Note

    imas.convert_ids has an optional keyword argument deep_copy. If you set this to True, the converted IDS will not share data with the original IDS.

  6. Update the ids_properties/comment in one version and print it in the other version. What do you notice?

  7. Sometimes data cannot be converted, for example when a node was added or removed, or when data types have changed. For example, set pulse_schedule.ec.antenna[0].phase.reference_name = "Test refname" and perform the conversion to DD 3.39.0 again. What do you notice?

import imas
from imas.util import get_data_dictionary_version

# 1. Create an IDSFactory for DD 3.25.0
factory = imas.IDSFactory("3.25.0")

# 2. Create a pulse_schedule IDS
pulse_schedule = factory.new("pulse_schedule")
print(get_data_dictionary_version(pulse_schedule))  # This should print 3.25.0

# 3. Fill the IDS with some test data
pulse_schedule.ids_properties.homogeneous_time = \
    imas.ids_defs.IDS_TIME_MODE_HOMOGENEOUS
pulse_schedule.ids_properties.comment = \
    "Testing renamed IDS nodes with IMAS-Python"
pulse_schedule.time = [1., 1.1, 1.2]

pulse_schedule.ec.antenna.resize(1)
antenna = pulse_schedule.ec.antenna[0]
antenna.name = "ec.antenna[0].name in DD 3.25.0"
antenna.launching_angle_pol.reference_name = \
    "ec.antenna[0].launching_angle_pol.reference_name in DD 3.25.0"
antenna.launching_angle_pol.reference.data = [2.1, 2.2, 2.3]
antenna.launching_angle_tor.reference_name = \
    "ec.antenna[0].launching_angle_tor.reference_name in DD 3.25.0"
antenna.launching_angle_tor.reference.data = [3.1, 3.2, 3.3]

# 4. Convert the IDS from version 3.25.0 to 3.39.0
pulse_schedule_3_39 = imas.convert_ids(pulse_schedule, "3.39.0")

# Check that the data is converted
imas.util.print_tree(pulse_schedule_3_39)

# 5. Update time data
pulse_schedule.time[1] = 3
# Yes, the time array of the converted IDS is updated as well:
print(pulse_schedule_3_39.time)  # [1., 3., 1.2]

# 6. Update ids_properties/comment
pulse_schedule.ids_properties.comment = "Updated comment"
print(pulse_schedule_3_39.ids_properties.comment)
# What do you notice?
#   This prints the original value of the comment ("Testing renamed IDS
#   nodes with IMAS-Python").
# This is actually the same that you get when creating a shallow copy
# with ``copy.copy`` of a regular Python dictionary:
import copy

dict1 = {"a list": [1, 1.1, 1.2], "a string": "Some text"}
dict2 = copy.copy(dict1)
print(dict2)  # {"a list": [1, 1.1, 1.2], "a string": "Some text"}
# dict2 is a shallow copy, so dict1["a_list"] and dict2["a_list"] are
# the exact same object, and updating it is reflected in both dicts:
dict1["a list"][1] = 3
print(dict2)  # {"a list": [1, 3, 1.2], "a string": "Some text"}
# Replacing a value in one dict doesn't update the other:
dict1["a string"] = "Some different text"
print(dict2)  # {"a list": [1, 3, 1.2], "a string": "Some text"}

# 7. Set phase.reference_name:
pulse_schedule.ec.antenna[0].phase.reference_name = "Test refname"
# And convert again
pulse_schedule_3_39 = imas.convert_ids(pulse_schedule, "3.39.0")
imas.util.print_tree(pulse_schedule_3_39)
# What do you notice?
#   Element 'ec/antenna/phase' does not exist in the target IDS. Data is not copied.

Automatic conversion between DD versions

When loading data (with get() or get_slice()) or storing data (with put() or put_slice()), IMAS-Python automatically converts the DD version for you. In this section we will see how that works.

The DBEntry DD version

A DBEntry object is tied to a specific version of the Data Dictionary. We have already briefly seen this in Exercise 1: The default DD version.

The DD version can be selected when constructing a new DBEntry object, through the dd_version or xml_path (see also Using custom builds of the Data Dictionary) parameters. If you provide neither, the default DD version is used.

When storing IDSs (put or put_slice), the DBEntry always converts the data to its version before writing it to the backend. When loading IDSs (get or get_slice) an option exists to disable autoconversion. Let’s see in the following two exercises how this works exactly.

Exercise 3: Automatic conversion when storing IDSs

  1. Load the training data for the core_profiles IDS. You can refresh how to do this in the following section of the basic training material: Open an IMAS database entry.

  2. Print the DD version for the loaded core_profiles IDS.

  3. Create a new DBEntry with DD version 3.37.0.

    new_entry = imas.DBEntry(
        imas.ids_defs.MEMORY_BACKEND, "test", 0, 0, dd_version="3.37.0"
    )
    
  4. Put the core_profiles IDS in the new DBEntry.

  5. Print the core_profiles.ids_properties.version_put.data_dictionary. What do you notice?

import imas
import imas.training
from imas.util import get_data_dictionary_version

# 1. Load the training data for the ``core_profiles`` IDS
entry = imas.training.get_training_db_entry()
core_profiles = entry.get("core_profiles")

# 2. Print the DD version:
print(get_data_dictionary_version(core_profiles))

# 3. Create a new DBEntry with DD version 3.37.0
new_entry = imas.DBEntry(
    imas.ids_defs.MEMORY_BACKEND, "test", 0, 0, dd_version="3.37.0"
)
new_entry.create()

# 4. Put the core_profiles IDS in the new DBEntry
new_entry.put(core_profiles)

# 5. Print version_put.data_dictionary
print(core_profiles.ids_properties.version_put.data_dictionary)
# -> 3.37.0
# What do you notice?
#   The IDS was converted to the DD version of the DBEntry (3.37.0) when writing the
#   data to the backend.

Exercise 4: Automatic conversion when loading IDSs

  1. For this exercise we will first create some test data:

    # Create an IDSFactory for DD 3.25.0
    factory = imas.IDSFactory("3.25.0")
    
    # Create a pulse_schedule IDS
    pulse_schedule = factory.new("pulse_schedule")
    
    # Fill the IDS with some test data
    pulse_schedule.ids_properties.homogeneous_time = IDS_TIME_MODE_HOMOGENEOUS
    pulse_schedule.ids_properties.comment = "Testing renamed IDS nodes with IMAS-Python"
    pulse_schedule.time = [1.0, 1.1, 1.2]
    
    pulse_schedule.ec.antenna.resize(1)
    antenna = pulse_schedule.ec.antenna[0]
    antenna.name = "ec.antenna[0].name in DD 3.25.0"
    antenna.launching_angle_pol.reference_name = (
        "ec.antenna[0].launching_angle_pol.reference_name in DD 3.25.0"
    )
    antenna.launching_angle_pol.reference.data = [2.1, 2.2, 2.3]
    antenna.launching_angle_tor.reference_name = (
        "ec.antenna[0].launching_angle_tor.reference_name in DD 3.25.0"
    )
    antenna.launching_angle_tor.reference.data = [3.1, 3.2, 3.3]
    antenna.phase.reference_name = "Phase reference name"
    
    # And store the IDS in a DBEntry using DD 3.25.0
    entry = imas.DBEntry(ASCII_BACKEND, "autoconvert", 1, 1, dd_version="3.25.0")
    entry.create()
    entry.put(pulse_schedule)
    entry.close()
    
    
  2. Reopen the DBEntry with the default DD version.

  3. get the pulse schedule IDS. Print its version_put/data_dictionary and Data Dictionary version (with get_data_dictionary_version()). What do you notice?

  4. Use imas.util.print_tree to print all data in the loaded IDS. What do you notice?

  5. Repeat steps 3 and 4, but set autoconvert to False. What do you notice this time?

import imas
from imas.ids_defs import ASCII_BACKEND, IDS_TIME_MODE_HOMOGENEOUS
from imas.util import get_data_dictionary_version

# 1. Create test data
# Create an IDSFactory for DD 3.25.0
factory = imas.IDSFactory("3.25.0")

# Create a pulse_schedule IDS
pulse_schedule = factory.new("pulse_schedule")

# Fill the IDS with some test data
pulse_schedule.ids_properties.homogeneous_time = IDS_TIME_MODE_HOMOGENEOUS
pulse_schedule.ids_properties.comment = "Testing renamed IDS nodes with IMAS-Python"
pulse_schedule.time = [1.0, 1.1, 1.2]

pulse_schedule.ec.antenna.resize(1)
antenna = pulse_schedule.ec.antenna[0]
antenna.name = "ec.antenna[0].name in DD 3.25.0"
antenna.launching_angle_pol.reference_name = (
    "ec.antenna[0].launching_angle_pol.reference_name in DD 3.25.0"
)
antenna.launching_angle_pol.reference.data = [2.1, 2.2, 2.3]
antenna.launching_angle_tor.reference_name = (
    "ec.antenna[0].launching_angle_tor.reference_name in DD 3.25.0"
)
antenna.launching_angle_tor.reference.data = [3.1, 3.2, 3.3]
antenna.phase.reference_name = "Phase reference name"

# And store the IDS in a DBEntry using DD 3.25.0
entry = imas.DBEntry(ASCII_BACKEND, "autoconvert", 1, 1, dd_version="3.25.0")
entry.create()
entry.put(pulse_schedule)
entry.close()

# 2. Reopen the DBEntry with DD 3.42.0:
entry = imas.DBEntry(ASCII_BACKEND, "autoconvert", 1, 1, dd_version="3.42.0")
entry.open()

# 3. Get the pulse schedule IDS
ps_autoconvert = entry.get("pulse_schedule")

print(f"{ps_autoconvert.ids_properties.version_put.data_dictionary=!s}")
print(f"{get_data_dictionary_version(ps_autoconvert)=!s}")
# What do you notice?
#   version_put: 3.25.0
#   get_data_dictionary_version: 3.40.0 -> the IDS was automatically converted

# 4. Print the data in the loaded IDS
imas.util.print_tree(ps_autoconvert)
# What do you notice?
#   1. The antenna AoS was renamed
#   2. Several nodes no longer exist!

print()
print("Disable autoconvert:")
print("====================")
# 5. Repeat steps 3 and 4 with autoconvert disabled:
ps_noconvert = entry.get("pulse_schedule", autoconvert=False)

print(f"{ps_noconvert.ids_properties.version_put.data_dictionary=!s}")
print(f"{get_data_dictionary_version(ps_noconvert)=!s}")
# What do you notice?
#   version_put: 3.25.0
#   get_data_dictionary_version: 3.25.0 -> the IDS was not converted!

# Print the data in the loaded IDS
imas.util.print_tree(ps_noconvert)
# What do you notice?
#   All data is here exactly as it was put at the beginnning of this exercise.

Use cases for disabling autoconvert

As you could see in the exercise, disabling autoconvert enables you to retrieve all data exactly as it was stored. This can be useful, especially for non-active IDSs which may contain large changes between DD versions, such as:

  • Interactive plotting tools

  • Exploration of all stored data in a Data Entry

  • Etc.

Caution

The convert_ids() method warns you when data is not converted. Due to technical constraints, the autoconvert logic doesn’t log any such warnings.

You can work around this by explicitly converting the IDS:

>>> # Continuing with the example from Exercise 4:
>>> ps_noconvert = entry.get("pulse_schedule", autoconvert=False)
>>> imas.convert_ids(ps_noconvert, "3.40.0")
15:32:32 INFO     Parsing data dictionary version 3.40.0 @dd_zip.py:129
15:32:32 INFO     Starting conversion of IDS pulse_schedule from version 3.25.0 to version 3.40.0. @ids_convert.py:350
15:32:32 INFO     Element 'ec/antenna/phase' does not exist in the target IDS. Data is not copied. @ids_convert.py:396
15:32:32 INFO     Element 'ec/antenna/launching_angle_pol/reference/data' does not exist in the target IDS. Data is not copied. @ids_convert.py:396
15:32:32 INFO     Element 'ec/antenna/launching_angle_tor/reference/data' does not exist in the target IDS. Data is not copied. @ids_convert.py:396
15:32:32 INFO     Conversion of IDS pulse_schedule finished. @ids_convert.py:366
<IDSToplevel (IDS:pulse_schedule)>

Using custom builds of the Data Dictionary

In the previous sections we showed how you can direct IMAS-Python to use a specific released version of the Data Dictionary definitions. Sometimes it is useful to work with unreleased (development or custom) versions of the data dictionaries as well.

Caution

Unreleased versions of the Data Dictionary should only be used for testing.

Do not use an unreleased Data Dictionary version for long-term storage: data might not be read properly in the future.

If you build the Data Dictionary, a file called IDSDef.xml is created. This file contains all IDS definitions. To work with a custom DD build, you need to point IMAS-Python to this IDSDef.xml file:

Use a custom Data Dictionary build with IMAS-Python
my_idsdef_file = "path/to/IDSDef.xml"  # Replace with the actual path

# Point IDSFactory to this path:
my_factory = imas.IDSFactory(xml_path=my_idsdef_file)
# Now you can create IDSs using your custom DD build:
my_ids = my_factory.new("...")

# If you need a DBEntry to put / get IDSs in the custom version:
my_entry = imas.DBEntry("imas:hdf5?path=my-testdb", "w", xml_path=my_idsdef_file)

Once you have created the IDSFactory and/or DBEntry pointing to your custom DD build, you can use them like you normally would.

Footnotes


Last update: 2026-01-28