Create xarray.DataArray from an IDS

Info

This lesson was written before imas.util.to_xarray() was implemented. This lesson is retained for educational purposes, however we recommend to use imas.util.to_xarray() instead of manually creating xarray DataArrays.

See also: Convert IMAS-Python IDSs directly to Xarray Datasets.

Let’s start with an introduction of Xarray. According to their website (where you can also find an excellent summary of why that is useful):

Quote

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.

In this lesson, we will use the metadata from the Data Dictionary to construct a DataArray from an IDS.

Note

This section uses the python package xarray. This package can be installed by following the instructions on their website.

Exercise 1: create a DataArray for profiles_1d/temperature

  1. Load the training data for the core_profiles IDS. You can refresh how to do this in the following section of the basic training material: Open an IMAS database entry.

  2. Get the average ion temperature data of the first time slice of profiles_1d.

  3. To create a DataArray from this temperature data, we need to give the following items to xarray:

    • The data itself.

    • The coordinates and their values as a Python dictionary {"coordinate_name": coordinate_value, [...]}.

    • Any additional attributes. For this example we add the units.

    • The name of the data.

    Get these values for our temperature array.

  4. Create the xarray.DataArray: xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name). Print the data array.

  5. Now we can use the xarray API. Let’s try some examples:

    1. Select all items where rho_tor_norm is between 0.4 and 0.6: temperature.sel(rho_tor_norm=slice(0.4, 0.6)).

    2. Interpolate the data to a different grid: temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11))

    3. Create a plot: temperature.plot()

This exercise was created before the implementation of imas.util.to_xarray(). The original approach is available below for educational purposes.

import os

import matplotlib
# To avoid possible display issues when Matplotlib uses a non-GUI backend
if "DISPLAY" not in os.environ:
    matplotlib.use("agg")
else:
    matplotlib.use("TKagg")

import matplotlib.pyplot as plt
import numpy
import imas
import imas.training
import xarray

# 1. Load core_profiles IDS from training DBEntry
entry = imas.training.get_training_db_entry()
cp = entry.get("core_profiles")

# 2. Store the temperature of the first time slice
temperature = cp.profiles_1d[0].t_i_average

# 3. Get the required labels and data:
data = temperature
coordinates = {
    coordinate.metadata.name: coordinate
    for coordinate in data.coordinates
}
attributes = {"units": data.metadata.units}
name = data.metadata.name

# 4. Create the DataArray
temperature = xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name)
print(temperature)

# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))

# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))

# 5c. Plot
temperature.plot()
plt.show()

Exercise 2: include the time axis in the DataArray

In the previous exercise we created a DataArray for a variable in one time slice of the profiles_1d array of structures. When the grid is not changing in the IDS data (profiles_1d[i]/grid/rho_tor_norm is constant), it can be useful to construct a 2D DataArray with the time dimension:

  1. Load the training data for the core_profiles IDS.

  2. Get the average ion temperature data of the first time slice of profiles_1d. Verify that the coordinates are the same for all time slices with numpy.allclose.

  3. Concatenate the data of all time slices: numpy.array([arr1, arr2, ...]). Note that we have introduced an extra time coordinate now!

  4. Create the DataArray and print it.

  5. Now we can use the xarray API. Let’s try some examples:

    1. Select all items where rho_tor_norm is between 0.4 and 0.6: temperature.sel(rho_tor_norm=slice(0.4, 0.6)).

    2. Interpolate the data to a different grid: temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11))

    3. Interpolate the data to a different time base: temperature.interp(time=[10, 20])

    4. Create a 2D plot: temperature.plot(x="time", norm=matplotlib.colors.LogNorm())

This exercise was created before the implementation of imas.util.to_xarray(). Below code sample is updated to provide two alternatives: the first is based on imas.util.to_xarray(), the second is the original, manual approach.

import os

import matplotlib

# To avoid possible display issues when Matplotlib uses a non-GUI backend
if "DISPLAY" not in os.environ:
    matplotlib.use("agg")
else:
    matplotlib.use("TKagg")

import matplotlib.pyplot as plt
import numpy
import imas
import imas.training
import xarray

# 1. Load core_profiles IDS from training DBEntry
entry = imas.training.get_training_db_entry()
cp = entry.get("core_profiles")

#######################################################################################
# Steps 2, 3 and 4, using imas.util.to_xarray
# Create an xarray Dataset containing t_i_average and its coordinates
xrds = imas.util.to_xarray(cp, "profiles_1d/t_i_average")
# Note that profiles_1d.grid.rho_tor_norm is a 2D coordinate: its values may be
# different at different times.
#
# Since the values at different time slices differ only minutely in this example, we'll
# rename the `profiles_1d.grid.rho_tor_norm:i` dimension to `rho_tor_norm` and set the
# values to the values of rho_tor_norm of the first time slice:
xrds = xrds.rename({"profiles_1d.grid.rho_tor_norm:i": "rho_tor_norm"}).assign_coords(
    {"rho_tor_norm": xrds["profiles_1d.grid.rho_tor_norm"].isel(time=0).data}
)

# Extract temperatures as an xarray DataArray
temperature = xrds["profiles_1d.t_i_average"]

# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))

# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))

# 5c. Interpolate temperature on a new time base: [10, 20]
print(temperature.interp(time=[10, 20]))

# 5d. Plot
temperature.plot(x="time", norm=matplotlib.colors.LogNorm())
plt.show()

#######################################################################################
# We can also manually build an xarray DataArray, this is shown below:

# 2. Store the temperature of the first time slice
temperature = cp.profiles_1d[0].t_i_average

# Verify that the coordinates don't change
for p1d in cp.profiles_1d:
    assert numpy.allclose(p1d.t_i_average.coordinates[0], temperature.coordinates[0])

# 3. Get the required labels and data:
# Concatenate all temperature arrays:
data = numpy.array([p1d.t_i_average for p1d in cp.profiles_1d])
coordinates = {
    "time": cp.profiles_1d.coordinates[0],
    **{
        coordinate.metadata.name: coordinate
        for coordinate in temperature.coordinates
    }
}
attributes = {"units": temperature.metadata.units}
name = "t_i_average"

# 4. Create the DataArray
temperature = xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name)
print(temperature)

# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))

# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))

# 5c. Interpolate temperature on a new time base: [10, 20]
print(temperature.interp(time=[10, 20]))

# 5d. Plot
temperature.plot(x="time", norm=matplotlib.colors.LogNorm())
plt.show()

Last update: 2026-01-28