Skip to content

fix overload defaults for as_index in groupby #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MarcoGorelli
Copy link
Member

These overloads currently show as_index: Literal[False] = ..., but that's not quite correct because the default is True

https://github.com/pandas-dev/pandas/blob/9597c0397962b228f00805e0750b91d0e5272ce9/pandas/core/frame.py#L9375

If as_index: Literal[False] were to be used instead, then mypy would complain that a non-default parameter (as_index) follows a default one (level / axis). Fixing this would require making an extra overload: one for when as_index is passed positionally and one for when it's passed by name

Given that:

Is it OK to just type if it if it's passed by name? The alternative would be...to bring the number of overloads to 24 🤯 Certainly doable, my preference would be to keep them down and encourage passing as_index by name

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Aug 13, 2025

I'm not OK with this change for a couple of reasons.

  1. If we require as_index to be a named argument, then it will be incompatible with the implementation, which allows as_index to be positional. So a change should be made in pandas first before we make the change in the stubs.
  2. I think the solution to make True the default is to flip the pairs of overloads. E.g., right now, we have:
    @overload
    def groupby(  # pyright: ignore reportOverlappingOverload
        self,
        by: Scalar,
        axis: AxisIndex | _NoDefaultDoNotUse = ...,
        level: IndexLabel | None = ...,
        as_index: Literal[True] = True,
        sort: _bool = ...,
        group_keys: _bool = ...,
        observed: _bool | _NoDefaultDoNotUse = ...,
        dropna: _bool = ...,
    ) -> DataFrameGroupBy[Scalar, Literal[True]]: ...
    @overload
    def groupby(
        self,
        by: Scalar,
        axis: AxisIndex | _NoDefaultDoNotUse = ...,
        level: IndexLabel | None = ...,
        as_index: Literal[False] = ...,
        sort: _bool = ...,
        group_keys: _bool = ...,
        observed: _bool | _NoDefaultDoNotUse = ...,
        dropna: _bool = ...,
    ) -> DataFrameGroupBy[Scalar, Literal[False]]: ...

I think the right thing to do is to flip the order:

    @overload
    def groupby(
        self,
        by: Scalar,
        axis: AxisIndex | _NoDefaultDoNotUse = ...,
        level: IndexLabel | None = ...,
        as_index: Literal[False] = False,
        sort: _bool = ...,
        group_keys: _bool = ...,
        observed: _bool | _NoDefaultDoNotUse = ...,
        dropna: _bool = ...,
    ) -> DataFrameGroupBy[Scalar, Literal[False]]: ...
    @overload
    def groupby(  # pyright: ignore reportOverlappingOverload
        self,
        by: Scalar,
        axis: AxisIndex | _NoDefaultDoNotUse = ...,
        level: IndexLabel | None = ...,
        as_index: Literal[True]= True,
        sort: _bool = ...,
        group_keys: _bool = ...,
        observed: _bool | _NoDefaultDoNotUse = ...,
        dropna: _bool = ...,
    ) -> DataFrameGroupBy[Scalar, Literal[True]]: ...

So if False is specified, it will match first. Otherwise, True is assumed.

Can you try that instead?

@MarcoGorelli
Copy link
Member Author

The issue with that is that if as_index isn't specified, then it'll match the first overload, whereas it should match the second one

Perhaps 24 overloads isn't that bad? Or should I first deprecate non-keyword arguments except level in pandas itself?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Aug 13, 2025

The issue with that is that if as_index isn't specified, then it'll match the first overload, whereas it should match the second one

Right. I see your point.

Perhaps 24 overloads isn't that bad? Or should I first deprecate non-keyword arguments except level in pandas itself?

I'm OK with 24 overloads, although I think we should deprecate non-keyword arguments expect level in pandas itself first.

@MarcoGorelli MarcoGorelli marked this pull request as draft August 13, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants