Skip to content

Specify the new xmap API #515

@danlooo

Description

@danlooo

We are currently in a phase of transitioning from mapCube to the xmap interface.
My goal is to use this opportunity to comply to DimensionalData.jl and the Common Data Model as much as possible.
In addition, let's ensure that this API can fit all possible features as there are in numpy/xarray/dask.

Basic concept

This is analogue to the numpy Generalized universal function API.
We want to apply a function using elements of aligned arrays of a dataset (analogue to xarray.Dataset).
Dimensions may be modified, removed or added during the process.
Output element type may change.

Issues

Currently, XOutput does not define the location and storage of the output. Abstractions are needed to file paths, object names, and sub paths (e.g. group within a single NetCDF file).

What about the slice ⊘ operator, e.g. in xmap(one_to_many, yax_test⊘:time, output=(output_one, output_two, output_flat)?? Numpy distinguishes between core dimensions part of input/output and loop dimensions (other dims present in the input cube).
yax_test⊘:time means time is a loop dimension here, right?
I'd argue to explicitly name input and output dimensions (like the old indims and outdims). All other dims are loop dimensions anyway.
If you don't want to loop over time, you would do a getindex or reduce first.
Let's keep the syntax explicit.

I think the old outdims/mapCube interface is still fine.
We just need to make sure that it will also work with data trees from the common data model and object storage.

Example

This uses simple DimArrays for demonstration.
The only difference from YAXarrays is the storage location and potential lazy evaluation.

using YAXArrays
using DimensionalData
using Statistics

elevations = DimArray(ones(X(1:20), Y(1:10)); name=:elevation)
temps = DimArray(ones(X(1:20), Y(1:10), Dim{:time}(1:12)); name=:temp)
ds = DimStack(elevations, temps)

# Apply a function on each element of an array
temp_K = temps .- 273.15
elevations_km = Base.map(x -> x / 1000, elevations)

# reduce dimensions
annual_mean_temp = Base.reduce((x, y) -> (x + y) / 2, temps)

# reduce while keep dimension
global_mean_temp = Base.mapslices(mean, ds.temp, dims=(:X, :Y))

# apply function isodd on every element of the array and then reduce with | while keep dimensions
annual_any_odd_temp = Base.mapreduce(isodd, |, temps, dims=:time)

# element wise operations on aligned dimensions of different arrays
temps .+ elevations

# adding an dimension
result = xmap(
    elevations,
    input=Input(:X, :Y),
    output=Output(:X, :Y, Ti(1:24); path=tempname(), driver=:zarr)
) do xin, xout
    xout .= 42
end

# add and remove dimensions
result = xmap(
    elevations,
    input=Input(:X, :Y, dims(temps, :time)),
    output=Output(:X, :Y, Dim{:band}(["red", "green", "blue"]); path=tempname(), driver=:zarr, eltype=Unit8)
) do xin, xout
    xout .= 42
end

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions