-
Couldn't load subscription status.
- Fork 225
Description
Transition PyPIConGPU to Pydantic
Introduction
This issue is supposed to track the status of transitioning PyPIConGPU to Pydantic. It will clearly layout what I consider a viable path and provide a space to discuss this. (At least) @PrometheusPi, @BrianMarre, @pordyna, @ikbuibui and @mafshari64 might want to closely monitor its progress.
I have already implemented a prototype that allows to formulate PyPIConGPU classes as pydantic.BaseModels, rely on BaseModel.model_dump for serialisation and remove their corresponding schemata. Applying this to 2 small/medium classes already has a net -200 lines of code in the git diff against mainline/dev and that's including probably all the infrastructural changes that it will take for most of the roll-out.
Probably (or at least hopefully), the proof-of-concept PRs implementing the logic but not rolling out the changes to the whole codebase will enter mainline/dev relatively soon. Afterwards, new developments can already make use of them. The roll-out PRs will probably involve a lot of (slightly-above-)monkey work and might uncover shortcomings of the original implementations. It might be beneficial if the main PICMI developers contribute to them to become familiar with the new infrastructure. (Although all-in-all it should become simpler and more obvious, so hopefully more to un-learn than to learn.)
The rest of this issue will discuss motivation, scope, logical steps and plan of attack.
Motivation
The current status relies on schemata and code-logic on various levels to validate (often redundantly) the input. That is extremely verbose but more importantly there's no official ground truth for validating against. Also, the hand-rolled code doing that is not quite user-friendly with its extremely rigid approach to type-safety.
Pydantic already offers well-tested validation and serialisation and as well as all reasonable type-conversion and interpretation of input out of the box. Formulating pydantic.BaseModels will fix the schemata and easily take care of sophisticated validation in one place while being much less verbose with a more friendly UI.
In a long-term scenario in which PICMI becomes the official UI and the C++ UI is no longer officially supported, we could even move validation from within the C++ code into the classes potentially removing runtime checks and/or reducing compilation times.
Scope and constraints
-
This is an internal refactoring in a production codebase, so we will attempt to keep
mainline/devfunctional at all times. -
PICMI UI shouldn't change. This is an internal refactoring!
-
Mustache templates shouldn't change. So, we reduce the risk of changing the actual behaviour of the generated code. Warnings about missing names during rendering will tip us off concerning unintended changes in behaviour.
-
Except for potentially getting rid of
RenderedObjectcompletely and fully relying onBaseModel.model_dumpat the very end,RenderedObject.get_rendering_contextwill still be the official protocol for generating the rendering context.
Logical steps
These steps provide one idea how to tackle the problem in a structured way while always maintaining a consistent state throughout the codebase. They are not yet the plan of attack, see below for that.
-
Make PICMI level capable of handling Pydantic
BaseModel.- This means that
copy_attributesmust be capable of instantiating a class with a valid constructor call. - Everything that's not relying on
copy_attributesyet must be taken care of manually.
- This means that
-
Transition PyPIConGPU from
typeguardand the customutilstuff to Pydantic.- This implies changing all
typeguard.TypeCheckErrors in the tests topydantic.ValidationError. - This will change the interaction with PyPIConGPU significantly.
- PyPIConGPU objects will have to be fully and validly initialised at construction.
- PyPIConGPU will be much more forgiving and powerful in how the information is provided, e.g.,
"1"can seamlessly be cast to anint. - PyPIConGPU objects will not have type-safety and/or validation guarantees afterwards.
- This also makes it more flexible for debugging or advanced manipulations while the foot guns are kind of obvious in this approach and it seems reasonable to assume that everybody deliberately changing a validly instantiated model is responsible for ensuring correctness.
- This implies changing all
-
Make the rendering process capable of handling
BaseModel.model_dumpas a drop-in replacement forRenderedObject._get_serialized. -
Remove
RenderedObject._get_serializedfrom the PyPIConGPU codebase. -
Make the rendering process capable of handling missing schemata in the schema store.
- The most elegant way would be to have Pydantic generate the missing schemas on demand.
- But that's very hard to achieve because during the validation, we have the
uriof the schema as the only hint on which class could be meant. So we'd have to- either keep track of the mapping between
uriand corresponding class somewhere - or generate the corresponding schemas at class (object) instantiation already.
- either keep track of the mapping between
- But
__init_subclass__voodoo seems to not cooperate well with Pydantic such that actions at class object instantiation is difficult to achieve.
- But that's very hard to achieve because during the validation, we have the
- A simple way is to replace the corresponding references in the remaining on-disk schemata with wildcards
{}.- This is fine because Pydantic has already done validation for us.
- The most elegant way would be to have Pydantic generate the missing schemas on demand.
-
Remove schemata from the PyPIConGPU codebase.
-
Remove schema validation from the PyPIConGPU and greatly simplify (or even remove)
RenderedObject. -
Remove redundant validation and type-safety tests because Pydantic validation takes care of that.
- At this point, the Pydantic models have become the ground truth for validation and we know that the mechanisms behind that are well-tested by others already.
How to pack this into PRs
Proof-of-concept stage
-
Proof-of-concept PR about proper Pydantic support.
- (1. Make PICMI level capable of handling Pydantic
BaseModel.) - (3. Make the rendering process capable of handling
BaseModel.model_dumpas a drop-in replacement forRenderedObject._get_serialized.) - Apply to some small examples and (2. Transition PyPIConGPU from
typeguardand the customutilstuff to Pydantic.) on those. - Don't apply this broadly yet.
- Because we still have all the schemata, we can still ensure that we're not messing things up beyond what the schemata already allow.
- We'll try to keep the tests untouched as much as possible. Necessary changes will be
- to use valid constructor calls instead of assignments because of PyPIConGPU objects will have to be fully and validly initialised at construction.
- drop some tests because of PyPIConGPU will be much more forgiving and powerful in how the information is provided, e.g.,
"1"can seamlessly be cast to anint. - This implies changing all
typeguard.TypeCheckErrors in the tests topydantic.ValidationError.
- (1. Make PICMI level capable of handling Pydantic
-
Proof-of-concept PR about supporting missing schemata in PyPIConGPU.
- (5. Make the rendering process capable of handling missing schemata in the schema store.)
- Apply to some small examples.
- Don't apply this broadly yet.
Roll-out stage
-
Roll-out PR(s) about (4. Remove
RenderedObject._get_serializedfrom the PyPIConGPU codebase.)- This includes (2. Transition PyPIConGPU from
typeguardand the customutilstuff to Pydantic.) throughout the full codebase. - This will likely uncover some tests that were using
RenderedObject._get_serializedinstead ofRenderedObject.get_rendering_context. - This will likely uncover shortcomings of our initial prototypes.
- We are still covered by our schemata here.
- This includes (2. Transition PyPIConGPU from
-
Roll-out PR(s) about (6. Remove schemata from the PyPIConGPU codebase.)
- We still have our tests in place, so if some of them were relying on schema validation failing, we can sharpen our
BaseModels to include those constraints and change the tests to usepydantic.ValidationError.
- We still have our tests in place, so if some of them were relying on schema validation failing, we can sharpen our
-
Roll-out PR about (7. Remove schema validation from the PyPIConGPU and greatly simplify (or even remove)
RenderedObject.) -
Roll-out PR about (8. Remove redundant validation and type-safety tests because Pydantic validation takes care of that.)