Skip to content

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Aug 16, 2025

Changes proposed in this pull request

Per discussion #1926 (comment), type hinting should not be attached to external modules imported by Cantera. This PR prevents export of those modules, which should simplify tests requested for #1926.

See also #616

Checklist

  • The pull request includes a clear description of this code change
  • Commit messages have short titles and reference relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • Style & formatting of contributed code follows contributing guidelines
  • The pull request is ready for review

@ischoegl ischoegl marked this pull request as ready for review August 16, 2025 16:15
@ischoegl
Copy link
Member Author

@TimothyEDawson and @bryanwweber ... this is an implementation that should be a compromise between __all__ and @bryanwweber's comment in #1926 (comment). While this doesn't use __init__.py directly, it is the most 'surgical' placement of selective imports I could think of.

@ischoegl ischoegl requested a review from a team August 16, 2025 16:28
@TimothyEDawson
Copy link

TimothyEDawson commented Aug 16, 2025

@ischoegl I actually had started working on my own implementation a while back, sorry I hadn't had time to make a pull request. You can take a look here, maybe there are some ideas you'd like: main...TimothyEDawson:cantera:fix-python-namespace-pollution

Note that most of the "unnecessary" changes such as whitespace changes and changing single quotes to double quotes were due to my use of Ruff autoformatting. I'd probably revert those if I were to continue working on it. I hadn't removed the star imports at this point as it quickly gets out of hand, as seen in my type stub pull request! If I recall correctly, I think those are more important internally, where you want explicit imports of any classes or functions used within the file, whereas __all__ seems to take care of the external runtime usage, and explicitly importing from the originating module rather than indirectly through the cantera namespace (i.e. __init__) should be fine for type checkers and intellisense.

@ischoegl
Copy link
Member Author

ischoegl commented Aug 16, 2025

@TimothyEDawson ... ok - I didn't know. From my perspective, either of the two PRs is fine (took me about 2 hours to put together this morning, including testing/tweaking, so I'm not really vested). Regarding your alternative, @bryanwweber does not like __all__ at all, so I believe @speth will have to act as the tie breaker ... either option is 👍 with me as it's an improvement over the status quo and both should resolve the roadblock in #1926.

PS: My objective was to be minimally invasive and only take care of the external interface.

@TimothyEDawson
Copy link

@ischoegl you couldn't have known, and that's on me. I've been swamped by work so I kept putting off working on this. I had been working on building the case for __all__ and/or against star imports using practical examples of the downstream effect the potential changes would have on my workflow as an end user of Cantera. It gets a bit complicated between the various options available, but for example with the current main branch I see:

ctdot_vanilla_IDE

and with my branch using __all__ I see:

ctdot_withall_IDE

I had become stuck trying to figure out how to prevent os, sys, Path, and warnings, presumably from __init__.py, from showing up. I see you simply called del on them, which is an unexpected solution! I hope to revisit this soon. I wouldn't be opposed to your proposed changes in the interim, but I would like to try comparing and contrasting it with an __all__-based solution, and with a combination of both.

@ischoegl
Copy link
Member Author

ischoegl commented Aug 16, 2025

@TimothyEDawson ... no problem. I have no preference for __all__ or selective imports; both have the same effect on the exported symbols, although there may be other issues at play. Beyond that, I believe we're getting closer to wrapping various efforts up for the 3.2 release, which prompted my action: this PR was a low hanging fruit.

As an aside, your usage example is pretty much the GUI version of dir(ct) 😂

@TimothyEDawson
Copy link

As an aside, your usage example is pretty much the GUI version of dir(ct) 😂

Oh definitely, and I do find myself calling dir in interactive sessions often. I also had put together some equivalent images from the REPL! Fun fact, in 3.14 you can get suggestions like this:

from_cantera_with_all

@TimothyEDawson
Copy link

I reviewed this pull request and did some comparisons with my __all__-based approach, and I think we should go with this. It looks good to me as-is. The only other thing I might suggest is to also remove star imports like in composite.py, replacing from ._cantera import * with specific imports of Transport, Kinetics, PureFluid, etc., as _cantera exports quite a lot.

image

My use of __all__ doesn't really help with this because most of these are, presumably, intended to be available within _cantera. So it's not really a substitute for removing the star import. It does seem like I overestimated the benefits of __all__. It does reduce what is imported via from [module] import *, but otherwise doesn't seem to do much if anything. For example, it doesn't hide anything from intelliesense:

image

Which looks the same on your branch. It still sees other imported libraries like np, and it doesn't seem to help the REPL to autocomplete like I expected:

image

Although the behavior does change a bit on 3.14 and when working with the stub files.

@ischoegl
Copy link
Member Author

ischoegl commented Aug 18, 2025

@TimothyEDawson ... thanks for having a look at this. I eliminated a couple of internal star imports (specifically the from xyz import * variant, but left the from xyz cimport * (mostly) alone).

This still requires a review by @bryanwweber or @speth, as our workflow requires approvals from non-authors prior to merging.

Copy link

codecov bot commented Aug 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.92%. Comparing base (25103c9) to head (d7faa8f).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1947   +/-   ##
=======================================
  Coverage   74.92%   74.92%           
=======================================
  Files         448      448           
  Lines       55763    55777   +14     
  Branches     9204     9204           
=======================================
+ Hits        41780    41792   +12     
- Misses      10881    10883    +2     
  Partials     3102     3102           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TimothyEDawson
Copy link

Excellent!

@speth
Copy link
Member

speth commented Aug 18, 2025

I think there is a misunderstanding about the badness of wildcard/star imports; Per PEP8, the admonition against them specifically excepts the use to "republish an internal interface as part of a public API" which is essentially what we're doing due to the hybrid Cython / pure Python nature of the Cantera package.

I'll also point out that star imports are used in a similar way in exemplary projects such as Numpy.

The one thing that has changed since our discussion in #616 is that Numpy and some other major projects do seem to have moved toward cleaning up what is exposed in their "public interface", perhaps due to the increased use of IDEs and type checkers that do rely on this sort of introspection, so I do understand the desire to achieve a similar cleanup for Cantera.

I still strongly dislike the idea of creating a requirement to maintain these extensive lists of all exported names, especially when those lists are separated from the corresponding definitions (what is this, C++? 🙄). I am certain that this will lead to confusion for anyone who tries to add a new function/class in Cantera and wonders why it's inaccessible.

Further, the vast majority of the names currently in the cantera namespace do belong there. On the main branch, excluding (_foo) or dunder (__foo__) names, I count 176 items that belong in the public interface and 17 that do not. So I think it would be much easier to work this the other direction by excluding those handful of names from the cantera namespace. There are two ways to do this:

  • Import those names privately, e.g. import sys as _sys
  • Call del on any local imports at the end of the corresponding submodule.
    We already use the first approach in a number of places, and could expand the use of that pattern. The second option might be better for heavily-used modules like Numpy.

Regarding the submodule namespaces like cantera.composite, I think these should really be regarded as an implementation detail -- the intended use (as demonstrated in every single example we provide) is to use Cantera's classes from the top-level namespace. The submodules really only exist for the sake of splitting the source into files of manageable sizes and because of the hybrid use of Cython and Python .

@ischoegl
Copy link
Member Author

ischoegl commented Aug 18, 2025

@speth ... thanks for your feedback, but I disagree: cleaning up our interfaces is a direction that is worth pursuing.

I still strongly dislike the idea of creating a requirement to maintain these extensive lists of all exported names, especially when those lists are separated from the corresponding definitions (what is this, C++? 🙄)

Since __all__ was not viewed as palatable, the selective imports are the obvious alternative. Fwiw, __all__ would keep everything in the same file ... . The _abc route does work (except for in __init__.py), but is overall a lot more intrusive; it's also beyond kludgy if you pick functions or classes from a module, as every entry would require its own from abc import xyz as _xyz). Running del is, in my honest opinion, not much different than the __all__ route in reverse.

Fwiw, I found one instance for from .thermo cimport * where the only used import was NumPy (!). To stick with the C++ analogy, any from ._cantera import * is the equivalent of including every single Cantera header in our config.h ... (obviously neither an exact nor a workable analogy)

@ischoegl ischoegl closed this Aug 18, 2025
@bryanwweber
Copy link
Member

bryanwweber commented Aug 18, 2025

I think we're all in agreement that cleaning up the exported interface is a worthwhile effort, I think the disagreement is over the tradeoffs of the different approaches.

it's also beyond kludgy if you pick functions or classes from a module, as every entry would require its own from abc import xyz as _xyz).

I don't think it's that kludgy. How many times do we have from abc import xyz, def, ghi? I think that would be the main place where it'd feel awkward, when there're multiple imports from the same package.

Running del is, in my honest opinion, not much different than the all route in reverse.

I think that's Ray's point, we have many fewer places to del than add to __all__, so del is overall easier to maintain.

Fwiw, I found one instance for from .thermo cimport * where the only used import was NumPy (!)

This sounds like a case where there were previously other imports used, but those were removed over time and never got cleaned up. Seems like an easy fix 😊

@TimothyEDawson
Copy link

TimothyEDawson commented Aug 18, 2025

I think there is a misunderstanding about the badness of wildcard/star imports; Per PEP8, the admonition against them specifically excepts the use to "republish an internal interface as part of a public API" which is essentially what we're doing due to the hybrid Cython / pure Python nature of the Cantera package.

I agree insofar as Cantera does this, i.e. within __init__.py and within _cantera.pyx. In other places this does not apply.

I am certain that this will lead to confusion for anyone who tries to add a new function/class in Cantera and wonders why it's inaccessible.

This shouldn't happen, as they wouldn't be inaccessible. Proper use of __all__ or removal of star imports only actually removes items that are being implicitly included, such as cantera.composite.YamlWriter. Items which are simply hidden from the public interface via exclusion from __all__ or by prepending an underscore are still there and are still importable, so e.g. cantera.composite.Transport couldn't go anywhere due to its usage within the file.

Further, the vast majority of the names currently in the cantera namespace do belong there.

With the exclusion of those few you mention like os, sys, np, the main Cantera namespace is certainly not the main offender here.

Regarding the submodule namespaces like cantera.composite, I think these should really be regarded as an implementation detail.

I personally prefer to import classes from where they are defined, but I accept that's a matter of taste. Though one issue which hadn't been raised here (and was the original reason I brought it up) is the impact this has on developers and type checking. For example, within composite.py the language server and type-checkers cannot tell where classes like Transport are coming from when you use star imports.

image

Meanwhile proper explicit imports help you to catch all sorts of subtle mistakes you may not have even known went against some Python standard practices.

image

Now I'm just a novice at type annotations so I am still trying to figure out best practices, but I can confidently say the star imports are a no-go if you want any hope of type annotations. Even if I were to keep everything in stub files, I would just end up overriding what's in the Python files, with even more code to maintain.

@TimothyEDawson
Copy link

Oh, one additional point regarding this:

I still strongly dislike the idea of creating a requirement to maintain these extensive lists of all exported names, especially when those lists are separated from the corresponding definitions (what is this, C++? 🙄).

Did you take a look at my __all__ implementation? Each module defines its own exports, and then _cantera.pyx and __init__.py generate their own __all__ dynamically by appending the __all__ from each module being star-imported. My linters don't like that, but it was stated to be a valid option in some PEP I read, I'll try to dig that up.

@ischoegl
Copy link
Member Author

Given what has been said, I believe that whatever we decide on needs to be in support of type hinting and language server support, as this is what new users will benefit from most, especially if they're not seasoned Python users already. If this requires a little extra work on the maintainer side, so be it.

Regarding 2 cents: outside of __init__.py, I don't think that star imports should be used (we had a related discussion in #1791). When it comes to the use of __all__ and selective imports, selective imports (as in this PR) wins. Both use lists that need maintenance, but __all__ may be a sideshow if we're selectively importing rather than importing the entire module (as in import numpy as np).

Fwiw, combining selective imports as in from abc import xyz, ddef, ghi and using the underscore to make them private is a terrible combination, as we have to pick between from abc import xyz as _xyz, ddef as _ddef, ghi as _ghi or unfold things into one import per line. But that's what would be necessary if we want to avoid star imports ... 😱

Overall, I'm sitting on the fence regarding continuing here - I provided a solution, but if the decision is to go a different direction, I'm 👌 with it.

@ischoegl
Copy link
Member Author

Fwiw, I found one instance for from .thermo cimport * where the only used import was NumPy (!)

This sounds like a case where there were previously other imports used, but those were removed over time and never got cleaned up. Seems like an easy fix 😊

It's pretty clear where the issue comes from 😄; the remedy is less straightforward: unless you selectively import, you'll never find out, as * literally imports the kitchen sink.

@speth
Copy link
Member

speth commented Aug 20, 2025

@speth ... thanks for your feedback, but I disagree: cleaning up our interfaces is a direction that is worth pursuing.

I am not saying that it isn't worth cleaning up our interface. If that was what I meant, I would not have been suggesting alternative approaches to achieving this goal.

Though one issue which hadn't been raised here (and was the original reason I brought it up) is the impact this has on developers and type checking. For example, within composite.py the language server and type-checkers cannot tell where classes like Transport are coming from when you use star imports.

For these uses within Cantera, where the classes come from the Cython module, does changing this to an explicit import even help? If I add from ._cantera import Transport, PyLance in VS Code still can't import the compiled Cython module because it's in a out-of-source build directory. So while you'll get rid of the the specific error about Transport being undefined, the IDE still won't know anything more about what Transport is. Or does the .pyi file help in this case?

I am certain that this will lead to confusion for anyone who tries to add a new function/class in Cantera and wonders why it's inaccessible.

This shouldn't happen, as they wouldn't be inaccessible. Proper use of __all__ or removal of star imports only actually removes items that are being implicitly included, such as cantera.composite.YamlWriter. Items which are simply hidden from the public interface via exclusion from __all__ or by prepending an underscore are still there and are still importable, so e.g. cantera.composite.Transport couldn't go anywhere due to its usage within the file.

I was referring to the "explicit import" approach that was implemented in this PR, which would certainly prevent accessing ct.YamlWriter if it is not imported in cantera/_cantera.pyx (which then becomes part of the cantera namespace via the one remaining star import in cantera/__init__.py.)

Regarding the submodule namespaces like cantera.composite, I think these should really be regarded as an implementation detail.

I personally prefer to import classes from where they are defined, but I accept that's a matter of taste.

This submodule structure is absolutely an implementation detail and not something that should be relied on to be stable across Cantera versions. The existence of the thermo, reactor, etc. submodules is an artifact of dividing the Cython module into multiple compilation units to reduce build times. For example, if you go back to Cantera 2.6, the origin of the Reaction class was cantera._cantera.Reaction, whereas in Cantera 3.1 it is cantera.reaction.Reaction. Nowhere in any example or in the documentation or in the test suite do we ever use or describe classes/functions as existing anywhere other than the cantera (or ct) namespace.

Given what has been said, I believe that whatever we decide on needs to be in support of type hinting and language server support, as this is what new users will benefit from most, especially if they're not seasoned Python users already.

I think this is a good point, and suggest that we focus at least for now on changes that affect the external interface and how it interacts with these features / tools. There are many internal things within the Cython module that could be changed but I don't think doing so provides any benefit to end users.

@TimothyEDawson
Copy link

For these uses within Cantera, where the classes come from the Cython module, does changing this to an explicit import even help? If I add from ._cantera import Transport, PyLance in VS Code still can't import the compiled Cython module because it's in a out-of-source build directory. So while you'll get rid of the the specific error about Transport being undefined, the IDE still won't know anything more about what Transport is. Or does the .pyi file help in this case?

Actually yes, it does! Within the .py file, the from ._cantera import Transport will detect the .pyi file and get the attributes and signatures from there. It's quite nice.

I was referring to the "explicit import" approach that was implemented in this PR, which would certainly prevent accessing ct.YamlWriter if it is not imported in cantera/_cantera.pyx (which then becomes part of the cantera namespace via the one remaining star import in cantera/__init__.py.)

Right, and presumably that's exactly what we want, no? IFF YamlWriter should be in _cantera, it should be imported from yamlwriter where it's defined (and indeed, it is). Thus __init__.py would get it from _cantera which gets it from yamlwriter, rather than relying on a difficult to follow side-effect chain where __init__.py star-imports it from composite, which star-imports it from _cantera, which star-imports it from yamlwriter (which also occurs, currently).

This submodule structure is absolutely an implementation detail and not something that should be relied on to be stable across Cantera versions.

Sure, I'm just saying that this doesn't only matter for the public interface, it also matters internally. E.g. should internal modules be relying on _cantera, as with composite, or should they import things hierarchically? I would argue the latter is much better software design, and much less confusing for developers.

One last point: I suspect the correct place to implement the explicit import lists will actually be within .pyi files for every .pyx file, as I believe these will take precedence both externally and internally. For .py files they would ideally be incorporated directly, but I suspect some of the .py files would end up retaining a .pyi file which covers their dynamically-generated-at-runtime interface. I'm still trying to determine if it would be better to merge those in to the source code or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants