Create xarray.DataArray from an IDS¶
Info
This lesson was written before imas.util.to_xarray() was
implemented. This lesson is retained for educational purposes, however we
recommend to use imas.util.to_xarray() instead of manually creating
xarray DataArrays.
See also: Convert IMAS-Python IDSs directly to Xarray Datasets.
Let’s start with an introduction of Xarray. According to their website (where you can also find an excellent summary of why that is useful):
Quote
Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multidimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.
In this lesson, we will use the metadata from the Data
Dictionary to construct a DataArray from an IDS.
Note
This section uses the python package xarray. This package can be installed by
following the instructions on their website.
Exercise 1: create a DataArray for profiles_1d/temperature¶
Load the training data for the
core_profilesIDS. You can refresh how to do this in the following section of the basic training material: Open an IMAS database entry.Get the average ion temperature data of the first time slice of
profiles_1d.To create a DataArray from this temperature data, we need to give the following items to
xarray:The data itself.
The coordinates and their values as a Python dictionary
{"coordinate_name": coordinate_value, [...]}.Any additional attributes. For this example we add the
units.The name of the data.
Get these values for our
temperaturearray.Create the
xarray.DataArray:xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name). Print the data array.Now we can use the
xarrayAPI. Let’s try some examples:Select all items where
rho_tor_normis between 0.4 and 0.6:temperature.sel(rho_tor_norm=slice(0.4, 0.6)).Interpolate the data to a different grid:
temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11))Create a plot:
temperature.plot()
This exercise was created before the implementation of
imas.util.to_xarray(). The original approach is available below
for educational purposes.
import os
import matplotlib
# To avoid possible display issues when Matplotlib uses a non-GUI backend
if "DISPLAY" not in os.environ:
matplotlib.use("agg")
else:
matplotlib.use("TKagg")
import matplotlib.pyplot as plt
import numpy
import imas
import imas.training
import xarray
# 1. Load core_profiles IDS from training DBEntry
entry = imas.training.get_training_db_entry()
cp = entry.get("core_profiles")
# 2. Store the temperature of the first time slice
temperature = cp.profiles_1d[0].t_i_average
# 3. Get the required labels and data:
data = temperature
coordinates = {
coordinate.metadata.name: coordinate
for coordinate in data.coordinates
}
attributes = {"units": data.metadata.units}
name = data.metadata.name
# 4. Create the DataArray
temperature = xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name)
print(temperature)
# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))
# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))
# 5c. Plot
temperature.plot()
plt.show()
Exercise 2: include the time axis in the DataArray¶
In the previous exercise we created a DataArray for a variable in one time slice of
the profiles_1d array of structures. When the grid is not changing in the IDS data
(profiles_1d[i]/grid/rho_tor_norm is constant), it can be useful to construct a 2D
DataArray with the time dimension:
Load the training data for the
core_profilesIDS.Get the average ion temperature data of the first time slice of
profiles_1d. Verify that the coordinates are the same for all time slices withnumpy.allclose.Concatenate the data of all time slices:
numpy.array([arr1, arr2, ...]). Note that we have introduced an extratimecoordinate now!Create the
DataArrayand print it.Now we can use the
xarrayAPI. Let’s try some examples:Select all items where
rho_tor_normis between 0.4 and 0.6:temperature.sel(rho_tor_norm=slice(0.4, 0.6)).Interpolate the data to a different grid:
temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11))Interpolate the data to a different time base:
temperature.interp(time=[10, 20])Create a 2D plot:
temperature.plot(x="time", norm=matplotlib.colors.LogNorm())
This exercise was created before the implementation of
imas.util.to_xarray(). Below code sample is updated to provide
two alternatives: the first is based on imas.util.to_xarray(),
the second is the original, manual approach.
import os
import matplotlib
# To avoid possible display issues when Matplotlib uses a non-GUI backend
if "DISPLAY" not in os.environ:
matplotlib.use("agg")
else:
matplotlib.use("TKagg")
import matplotlib.pyplot as plt
import numpy
import imas
import imas.training
import xarray
# 1. Load core_profiles IDS from training DBEntry
entry = imas.training.get_training_db_entry()
cp = entry.get("core_profiles")
#######################################################################################
# Steps 2, 3 and 4, using imas.util.to_xarray
# Create an xarray Dataset containing t_i_average and its coordinates
xrds = imas.util.to_xarray(cp, "profiles_1d/t_i_average")
# Note that profiles_1d.grid.rho_tor_norm is a 2D coordinate: its values may be
# different at different times.
#
# Since the values at different time slices differ only minutely in this example, we'll
# rename the `profiles_1d.grid.rho_tor_norm:i` dimension to `rho_tor_norm` and set the
# values to the values of rho_tor_norm of the first time slice:
xrds = xrds.rename({"profiles_1d.grid.rho_tor_norm:i": "rho_tor_norm"}).assign_coords(
{"rho_tor_norm": xrds["profiles_1d.grid.rho_tor_norm"].isel(time=0).data}
)
# Extract temperatures as an xarray DataArray
temperature = xrds["profiles_1d.t_i_average"]
# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))
# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))
# 5c. Interpolate temperature on a new time base: [10, 20]
print(temperature.interp(time=[10, 20]))
# 5d. Plot
temperature.plot(x="time", norm=matplotlib.colors.LogNorm())
plt.show()
#######################################################################################
# We can also manually build an xarray DataArray, this is shown below:
# 2. Store the temperature of the first time slice
temperature = cp.profiles_1d[0].t_i_average
# Verify that the coordinates don't change
for p1d in cp.profiles_1d:
assert numpy.allclose(p1d.t_i_average.coordinates[0], temperature.coordinates[0])
# 3. Get the required labels and data:
# Concatenate all temperature arrays:
data = numpy.array([p1d.t_i_average for p1d in cp.profiles_1d])
coordinates = {
"time": cp.profiles_1d.coordinates[0],
**{
coordinate.metadata.name: coordinate
for coordinate in temperature.coordinates
}
}
attributes = {"units": temperature.metadata.units}
name = "t_i_average"
# 4. Create the DataArray
temperature = xarray.DataArray(data, coords=coordinates, attrs=attributes, name=name)
print(temperature)
# 5a. Select subset of temperature where 0.4 <= rho_tor_norm < 0.6:
print(temperature.sel(rho_tor_norm=slice(0.4, 0.6)))
# 5b. Interpolate temperature on a new grid: [0, 0.1, 0.2, ..., 0.9, 1.0]
print(temperature.interp(rho_tor_norm=numpy.linspace(0, 1, 11)))
# 5c. Interpolate temperature on a new time base: [10, 20]
print(temperature.interp(time=[10, 20]))
# 5d. Plot
temperature.plot(x="time", norm=matplotlib.colors.LogNorm())
plt.show()