Enhancing AiiDA with domain-specific language (DSL) support for intuitive scientific workflow construction #22

unkcpz · 2025-05-29T21:40:55Z

Project

AiiDA

Summary

This project will build on an existing domain-specific language (DSL) interpreter prototype (developed in the Rust programming language) and turn it into production-ready code that enables users to define workflows using the new, intuitive, and simple DSL. The expected outcome will ensure seamless integration into the existing infrastructure by supporting the legacy workflow system and a smooth transition to the new syntax. In addition, the core runtime component for running the workflow representation will also be rewritten in Rust to gain native multithreading support, delivering a significant performance boost. Finally, the project will help to guide AiiDA's transition to a modern architecture, as well as streamline its dependencies and reducing reliance on less actively maintained Python packages.

submitter

Jusong Yu (@unkcpz)

project lead

@rabbull

Community benefit

AiiDA is a Python workflow manager for high-throughput materials simulations that records the full workflow data provenance, and promoting reproducibility of scientific results.
As its demand is growing, AiiDA's scope is expanding beyond computational materials science, into the fields of climate modelling and autonomous chemical synthesis.
Thus, the community is eager for a more intuitive approach to constructing workflows, supported by a high-performance, multithreaded runtime.

AiiDA will continue to serve the computational materials community as a platform for high-throughput simulations and reproducible data management.
Its plugin framework and flexible orchestration API has been also support new domains such as climate modeling and autonomous chemical synthesis.

To meet the needs of a growing user base, we will introduce a clearer way to build workflows.
A concise domain-specific language will let researchers describe complex, dynamic control flows with familiar programming constructs.
Built-in static checks will catch most errors before expensive execution on the remote HPC resource, so development cycles will shorten and reliability will improve.

We will also rewrite the core runtime component in Rust using tokio runtime from rust community to support native multithreading.
The native multithreading support will increase performance for large, concurrent workloads for modern computer architectures and will allow us to remove the RabbitMQ dependency -- one of the main barriers for new users to start to use AiiDA.

Amount requested

USD 8,676

Execution plan

The grant is meant to compensate @rabbull for 300 hours of work covering the implemenation, testing and documentation of the proposed feature.
The 300 hours into project will be distributed into ~5 months (20 weeks) of working time, because @rabbull will have ~30% of his working time devote to the project.

The Excution plan covers the project extend to 20 weeks (5 months) with planed deviverable details of every month as shown belew.

(weeks 1-2) compelete the log tracing for the runtime logging of rust asynchronous implementation.

In the prototype rust code, the logging system was supported but not completely covers all key state transitions.
The asynchronous programming can be hard to reasoning when the implemenation extend to cover more edge cases in the future.
Having a standard way of logging (using tokio-rs/tracing) can benefit for the fast development of following development to cover more required features in a fast pace.

The outcome of this stage is a log tracing system that cover all key state transitions in prototype implementation and a dynamic tracing setup to make it easy to find bugs of asynchronous code through logging (tokio-rs/console).

(week 3-6) decouple the DSL source resolve and runtime execution so the syntax error in workflow construction can be spotted before runtime.

After introducing the new DSL, the workflow analysis and resolving part can happen before the runtime where the syntax tree is traversed and executed.
This makes it possible to find the syntax errors of workflow constructions before interacting with any real remote resources.
The syntax tree parsed from workflow source code will be traversed and raise the user faced errors.
The error handling is the key interface of a small language.
By having an extra syntax tree resolve phase before runtime, the development cycles can be largelly shorten.

The outcome of this stage is a tree traversing implementation to collect all potential errors in workflow construction.

(week 6-16) convert the recursive tree interpreting implementation to linear operation instructions interpreting.

In the prototype, the syntax tree is traversed recursively, with statements and expressions executed through nested function calls.
To retain compatibility with AiiDA’s checkpoint mechanism, a robust way is required to serialize both the syntax-tree representation and the current execution state for later restoration.
The solution is to store the intermediate representation as a linearized sequence of instructions.
This approach not only lets checkpoints be saved using the framework's native facilities; it also brings break and continue into line with the rest of the interpret implementation, because they become simple jump instructions within the same linear format.

The result of this stage is a refactored interpret function that replaces recursive evaluation with linear execution, and provides first-class support for the break and continue keywords.

(week 16-18) Internal test with core user groups on the new syntax to gather feedback for fine-tuning.

It is hard to move the community toward a new approach to constructing workflows.
The prototype syntax, which blends basic elements of Python/Julia and Rust, must be verified by users to ensure it is simple enough without losing expressiveness.
In this stage, we will work closely with core AiiDA users to obtain feedback on the new syntax and iterate on it.

The result of this stage is to finalize the core syntax, for example deciding whether to require semicolons after statements and whether to separate variable assignment from variable declaration.

(week 19-20) buffer weeks for polishing the implementation.

Two weeks are reserved as a buffer for components that are not yet perfectly implemented.
If all tasks are completed on schedule, this buffer phase will be used to extend and refine the documentation.

Addition notes on the amount requested

The total cost for this project is USD 8,676.

It has been calculated based on the minimum wage standard in the City of Zurich where @rabbull is studying.

hourly rate: CHF 23.90 (Zurich minimum wage)
estimated effort: 300 hours
exchange rate: 1 CHF = 1.21 USD

Total Cost = 23.90 * 300 *1.21 = USD 8,676

Addition notes on the contributor

Zisen Liu (@rabbull) (will be responsible for leading project's execution)

@rabbull has been working with the AiiDA team as a visiting student since 2024, focusing on the storage backend. He is a Master's student at the University of Zurich, with professional experience in software engineering and distributed systems. During our collaboration, he addressed the parity of functionality between the SQLite and PostgreSQL backends and improved the performance of batched operations interacting with the database.

The text was updated successfully, but these errors were encountered:

unkcpz added the Awaiting approval label May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhancing AiiDA with domain-specific language (DSL) support for intuitive scientific workflow construction #22

Enhancing AiiDA with domain-specific language (DSL) support for intuitive scientific workflow construction #22

unkcpz commented May 29, 2025 •

edited

Loading

Uh oh!

Enhancing AiiDA with domain-specific language (DSL) support for intuitive scientific workflow construction #22

Enhancing AiiDA with domain-specific language (DSL) support for intuitive scientific workflow construction #22

Comments

unkcpz commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Project

Summary

submitter

project lead

Community benefit

Amount requested

Execution plan

Addition notes on the amount requested

Addition notes on the contributor

unkcpz commented May 29, 2025 •

edited

Loading