-
Notifications
You must be signed in to change notification settings - Fork 22
Description
We are currently in a phase of transitioning from mapCube
to the xmap
interface.
My goal is to use this opportunity to comply to DimensionalData.jl and the Common Data Model as much as possible.
In addition, let's ensure that this API can fit all possible features as there are in numpy/xarray/dask.
Basic concept
This is analogue to the numpy Generalized universal function API.
We want to apply a function using elements of aligned arrays of a dataset (analogue to xarray.Dataset).
Dimensions may be modified, removed or added during the process.
Output element type may change.
Issues
Currently, XOutput
does not define the location and storage of the output. Abstractions are needed to file paths, object names, and sub paths (e.g. group within a single NetCDF file).
What about the slice ⊘ operator, e.g. in xmap(one_to_many, yax_test⊘:time, output=(output_one, output_two, output_flat)?
? Numpy distinguishes between core dimensions part of input/output and loop dimensions (other dims present in the input cube).
yax_test⊘:time
means time is a loop dimension here, right?
I'd argue to explicitly name input and output dimensions (like the old indims
and outdims
). All other dims are loop dimensions anyway.
If you don't want to loop over time, you would do a getindex
or reduce first.
Let's keep the syntax explicit.
I think the old outdims/mapCube interface is still fine.
We just need to make sure that it will also work with data trees from the common data model and object storage.
Example
This uses simple DimArrays for demonstration.
The only difference from YAXarrays is the storage location and potential lazy evaluation.
using YAXArrays
using DimensionalData
using Statistics
elevations = DimArray(ones(X(1:20), Y(1:10)); name=:elevation)
temps = DimArray(ones(X(1:20), Y(1:10), Dim{:time}(1:12)); name=:temp)
ds = DimStack(elevations, temps)
# Apply a function on each element of an array
temp_K = temps .- 273.15
elevations_km = Base.map(x -> x / 1000, elevations)
# reduce dimensions
annual_mean_temp = Base.reduce((x, y) -> (x + y) / 2, temps)
# reduce while keep dimension
global_mean_temp = Base.mapslices(mean, ds.temp, dims=(:X, :Y))
# apply function isodd on every element of the array and then reduce with | while keep dimensions
annual_any_odd_temp = Base.mapreduce(isodd, |, temps, dims=:time)
# element wise operations on aligned dimensions of different arrays
temps .+ elevations
# adding an dimension
result = xmap(
elevations,
input=Input(:X, :Y),
output=Output(:X, :Y, Ti(1:24); path=tempname(), driver=:zarr)
) do xin, xout
xout .= 42
end
# add and remove dimensions
result = xmap(
elevations,
input=Input(:X, :Y, dims(temps, :time)),
output=Output(:X, :Y, Dim{:band}(["red", "green", "blue"]); path=tempname(), driver=:zarr, eltype=Unit8)
) do xin, xout
xout .= 42
end