Skip to content

Transition PyPIConGPU to Pydantic #5500

@chillenzer

Description

@chillenzer

Transition PyPIConGPU to Pydantic

Introduction

This issue is supposed to track the status of transitioning PyPIConGPU to Pydantic. It will clearly layout what I consider a viable path and provide a space to discuss this. (At least) @PrometheusPi, @BrianMarre, @pordyna, @ikbuibui and @mafshari64 might want to closely monitor its progress.

I have already implemented a prototype that allows to formulate PyPIConGPU classes as pydantic.BaseModels, rely on BaseModel.model_dump for serialisation and remove their corresponding schemata. Applying this to 2 small/medium classes already has a net -200 lines of code in the git diff against mainline/dev and that's including probably all the infrastructural changes that it will take for most of the roll-out.

Probably (or at least hopefully), the proof-of-concept PRs implementing the logic but not rolling out the changes to the whole codebase will enter mainline/dev relatively soon. Afterwards, new developments can already make use of them. The roll-out PRs will probably involve a lot of (slightly-above-)monkey work and might uncover shortcomings of the original implementations. It might be beneficial if the main PICMI developers contribute to them to become familiar with the new infrastructure. (Although all-in-all it should become simpler and more obvious, so hopefully more to un-learn than to learn.)

The rest of this issue will discuss motivation, scope, logical steps and plan of attack.

Motivation

The current status relies on schemata and code-logic on various levels to validate (often redundantly) the input. That is extremely verbose but more importantly there's no official ground truth for validating against. Also, the hand-rolled code doing that is not quite user-friendly with its extremely rigid approach to type-safety.

Pydantic already offers well-tested validation and serialisation and as well as all reasonable type-conversion and interpretation of input out of the box. Formulating pydantic.BaseModels will fix the schemata and easily take care of sophisticated validation in one place while being much less verbose with a more friendly UI.

In a long-term scenario in which PICMI becomes the official UI and the C++ UI is no longer officially supported, we could even move validation from within the C++ code into the classes potentially removing runtime checks and/or reducing compilation times.

Scope and constraints

  • This is an internal refactoring in a production codebase, so we will attempt to keep mainline/dev functional at all times.

  • PICMI UI shouldn't change. This is an internal refactoring!

  • Mustache templates shouldn't change. So, we reduce the risk of changing the actual behaviour of the generated code. Warnings about missing names during rendering will tip us off concerning unintended changes in behaviour.

  • Except for potentially getting rid of RenderedObject completely and fully relying on BaseModel.model_dump at the very end, RenderedObject.get_rendering_context will still be the official protocol for generating the rendering context.

Logical steps

These steps provide one idea how to tackle the problem in a structured way while always maintaining a consistent state throughout the codebase. They are not yet the plan of attack, see below for that.

  1. Make PICMI level capable of handling Pydantic BaseModel.

    • This means that copy_attributes must be capable of instantiating a class with a valid constructor call.
    • Everything that's not relying on copy_attributes yet must be taken care of manually.
  2. Transition PyPIConGPU from typeguard and the custom util stuff to Pydantic.

    • This implies changing all typeguard.TypeCheckErrors in the tests to pydantic.ValidationError.
    • This will change the interaction with PyPIConGPU significantly.
      • PyPIConGPU objects will have to be fully and validly initialised at construction.
      • PyPIConGPU will be much more forgiving and powerful in how the information is provided, e.g., "1" can seamlessly be cast to an int.
      • PyPIConGPU objects will not have type-safety and/or validation guarantees afterwards.
        • This also makes it more flexible for debugging or advanced manipulations while the foot guns are kind of obvious in this approach and it seems reasonable to assume that everybody deliberately changing a validly instantiated model is responsible for ensuring correctness.
  3. Make the rendering process capable of handling BaseModel.model_dump as a drop-in replacement for RenderedObject._get_serialized.

  4. Remove RenderedObject._get_serialized from the PyPIConGPU codebase.

  5. Make the rendering process capable of handling missing schemata in the schema store.

    • The most elegant way would be to have Pydantic generate the missing schemas on demand.
      • But that's very hard to achieve because during the validation, we have the uri of the schema as the only hint on which class could be meant. So we'd have to
        • either keep track of the mapping between uri and corresponding class somewhere
        • or generate the corresponding schemas at class (object) instantiation already.
      • But __init_subclass__ voodoo seems to not cooperate well with Pydantic such that actions at class object instantiation is difficult to achieve.
    • A simple way is to replace the corresponding references in the remaining on-disk schemata with wildcards {}.
      • This is fine because Pydantic has already done validation for us.
  6. Remove schemata from the PyPIConGPU codebase.

  7. Remove schema validation from the PyPIConGPU and greatly simplify (or even remove) RenderedObject.

  8. Remove redundant validation and type-safety tests because Pydantic validation takes care of that.

    • At this point, the Pydantic models have become the ground truth for validation and we know that the mechanisms behind that are well-tested by others already.

How to pack this into PRs

Proof-of-concept stage

  • Proof-of-concept PR about proper Pydantic support.

    • (1. Make PICMI level capable of handling Pydantic BaseModel.)
    • (3. Make the rendering process capable of handling BaseModel.model_dump as a drop-in replacement for RenderedObject._get_serialized.)
    • Apply to some small examples and (2. Transition PyPIConGPU from typeguard and the custom util stuff to Pydantic.) on those.
    • Don't apply this broadly yet.
    • Because we still have all the schemata, we can still ensure that we're not messing things up beyond what the schemata already allow.
    • We'll try to keep the tests untouched as much as possible. Necessary changes will be
      • to use valid constructor calls instead of assignments because of PyPIConGPU objects will have to be fully and validly initialised at construction.
      • drop some tests because of PyPIConGPU will be much more forgiving and powerful in how the information is provided, e.g., "1" can seamlessly be cast to an int.
      • This implies changing all typeguard.TypeCheckErrors in the tests to pydantic.ValidationError.
  • Proof-of-concept PR about supporting missing schemata in PyPIConGPU.

    • (5. Make the rendering process capable of handling missing schemata in the schema store.)
    • Apply to some small examples.
    • Don't apply this broadly yet.

Roll-out stage

  • Roll-out PR(s) about (4. Remove RenderedObject._get_serialized from the PyPIConGPU codebase.)

    • This includes (2. Transition PyPIConGPU from typeguard and the custom util stuff to Pydantic.) throughout the full codebase.
    • This will likely uncover some tests that were using RenderedObject._get_serialized instead of RenderedObject.get_rendering_context.
    • This will likely uncover shortcomings of our initial prototypes.
    • We are still covered by our schemata here.
  • Roll-out PR(s) about (6. Remove schemata from the PyPIConGPU codebase.)

    • We still have our tests in place, so if some of them were relying on schema validation failing, we can sharpen our BaseModels to include those constraints and change the tests to use pydantic.ValidationError.
  • Roll-out PR about (7. Remove schema validation from the PyPIConGPU and greatly simplify (or even remove) RenderedObject.)

  • Roll-out PR about (8. Remove redundant validation and type-safety tests because Pydantic validation takes care of that.)

Metadata

Metadata

Assignees

Labels

PICMIpypicongpu and picmi relatedrefactoringcode change to improve performance or to unify a concept but does not change public API

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions