Skip to content

Conversation

knotapun
Copy link

Hello, all this does is add a __slots__ entry to a few classes. this small change makes an outsized impact, reducing the size of instances dramatically, and leads the way to efficient and fully typed page extraction.

from pympler import asizeof
class Rect:
    def __init__(self, a, b, c, d):
        self.x0: float = float(a)
        self.y0: float = float(b)
        self.x1: float = float(c)
        self.y1: float = float(d)
class RectWSlot:
    __slots__ = ("x0", "y0", "x1", "y1")
    def __init__(self, a, b, c, d):
        self.x0: float = float(a)
        self.y0: float = float(b)
        self.x1: float = float(c)
        self.y1: float = float(d)
print(asizeof.asizeof(Rect(1, 2, 3, 4)))       # 624

print(asizeof.asizeof(RectWSlot(1, 2, 3, 4)))  # 160

@knotapun
Copy link
Author

I noticed the reasoning for non-typed values on the Structure of Dictionary Outputs section.

If this pull gets merged, would you accept a pull that types the dictionaries?

@JorjMcKie
Copy link
Collaborator

Thanks for the idea - and you are quite right: we should have considered doing this.
Before we accept: could you please extend this PR to the Matrix class please? The slots here would carry the names "a" through "f".

@JorjMcKie JorjMcKie self-requested a review August 15, 2025 21:31
Copy link
Collaborator

@JorjMcKie JorjMcKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please extend this idea to the remaining "geometry" class Matrix.

@knotapun
Copy link
Author

please extend this idea to the remaining "geometry" class Matrix.

Gladly! Note, this might be kind of limited in it's effectiveness because of some upstream code, namely the swig generated code. Seems easy enough to add.

@knotapun knotapun requested a review from JorjMcKie August 16, 2025 22:36
@julian-smith-artifex-com
Copy link
Collaborator

I think the test failures can be fixed by deleting IdentityMatrix's __setattr__() method. Could you try doing this in your PR?

[Presumably this was somehow forcing the creation of a __dict__ and so messing up lookups of in __slots__.]

@knotapun
Copy link
Author

Apologies, I've been trying to get a job, and I'm helping someone move. I also just found out how to run your tests locally via act, so it's a bit easier to make commits/changes.

@knotapun
Copy link
Author

I think the test failures can be fixed by deleting IdentityMatrix's __setattr__() method.

You have to be careful with that, as you don't want someone to change the identity matrix to be anything else.

@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Aug 29, 2025

I haven't tested it but the only reason for the existence of pymupdf.IdentityMatrix is its immutability. This is ensured by __setattr__().
So deleting this method also deletes the reason for having IdentityMatrix at all.
I could think of a way out:
Define this object as a @property
Sorry, that will not work either!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants