Skip to content

refactor: #718 only drop TimestampSeries #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

cmp0xff
Copy link
Contributor

@cmp0xff cmp0xff commented Jul 13, 2025

@cmp0xff cmp0xff marked this pull request as ready for review July 13, 2025 07:05
@cmp0xff cmp0xff changed the title fix: #718 only drop TimestampSeries refactor: #718 only drop TimestampSeries Jul 13, 2025
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from c81cd6e to d5e1089 Compare July 16, 2025 17:54
@cmp0xff cmp0xff marked this pull request as draft July 16, 2025 17:56
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from d5e1089 to 41c7015 Compare July 16, 2025 18:19
@cmp0xff cmp0xff marked this pull request as ready for review July 16, 2025 20:11
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch 2 times, most recently from abf9147 to cbbd372 Compare July 17, 2025 15:10
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 23, 2025

@cmp0xff you have a number of PRs submitted while I was out on vacation for 2 weeks. Can you let me know which ones I should prioritize for review?

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Jul 23, 2025

Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.

Series: arithmetic operations

The following two PRs are independent. They migrate test_series.py to a subfolder series, and add quite a few test_*.py files there.

DataFrame.to_dict

Index.append

Series: address #718

  1. refactor: #718 only drop TimestampSeries #1274 - this is a prerequisite for the next one.
  2. refactor: #718 also drop TimedeltaSeries #1273

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. It's a lot of good work.

Main thing - if I'm going to merge this PR, it needs to be in a state where we don't need the followup PR.

Basic rule - we don't put ignore in the tests unless we are testing that the stubs should not accept something that is invalid. You have places where you have added ignore in the tests and I won't merge that in (unless we know it is a bug in the type checker)

Comment on lines 2819 to 2821
check(assert_type(s + summer, pd.Series), pd.Series) # type: ignore[assert-type]
check(assert_type(s + df["y"], pd.Series), pd.Series) # type: ignore[assert-type]
check(assert_type(summer + summer, pd.Series), pd.Series) # type: ignore[assert-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't want to have ignore in the tests. Fix the types to make this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not there anymore.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 24, 2025

Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.

I've reviewed them all, except #1273 as noted there.

Thanks for all the great work.

@cmp0xff
Copy link
Contributor Author

cmp0xff commented Jul 24, 2025

I've reviewed them all, except #1273 as noted there.

Thanks for all the great work.

Thank you very much for your quick and thorough reviews. I will be able to work on them next week.

@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from cbbd372 to ed69ec5 Compare July 28, 2025 15:05
@cmp0xff cmp0xff marked this pull request as draft July 30, 2025 07:52
@cmp0xff cmp0xff force-pushed the hotfix/cmp0xff/gh718-drop-tss branch from b095af2 to f1cf19f Compare August 4, 2025 21:36
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close. There are 2 main issues reflected in the comments.

  1. I think the changes you made for __sub__() and possibly __mul__() and related methods to handle int, float, etc., should be a separate PR, similar to what you did for _add__() and __div__(). If we can get that working, and then do this one, it will be easier to make sure all the tests are still working right.
  2. I don't want to merge with any new ignore in the tests, because if someone pulls main, they will get something we know is broken. But I suggest a plan for handling that in the comments.

Thanks again for the great work on this.

Comment on lines 1604 to 1605
# Will be fixed after removing TimedeltaSeries, see Series.__sub__ in series.pyi
check(assert_type(ss, pd.Series), pd.Series) # type: ignore[assert-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm don't want to merge until we deal with this, because everything in main should pass all the tests. But here is what I'd like to do. Once I'm OK with this PR, I won't merge it, and then the TimedeltaSeries PR can merge into this branch where the line above should get fixed.

I could put this branch into the repo once approved so it becomes the new PR target of the next PR, and then once that is approved and merged, we do a new PR from that branch to main

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, there are a few cases where the return type is Series[Timedelta], instead of the to-be-removed TimedeltaSeries. When I put any of them as TimedeltaSeries, ss cannot be properly recognised.

# is invoked, but because of how Series.dt is hooked in and that we may not know the
# type of the series, we don't know which kind of series was ...ed
# in to the dt accessor

_DTTimestampTimedeltaReturnType = TypeVar(
"_DTTimestampTimedeltaReturnType",
bound=Series | TimestampSeries | TimedeltaSeries | DatetimeIndex | TimedeltaIndex,
"_DTTimestampTimedeltaReturnType", bound=Series | DatetimeIndex | TimedeltaIndex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why TimedeltaSeries is removed at this point. Or shouldn't it be bound=Series | Series[Timestamp] | TimedeltaSeries | DatetimeIndex | TimedeltaIndex ?

I know that Series includes the other 2, but I'd like to keep this as close as possible to what was there before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 2018 to 2040
@overload
def __sub__(self: Series[S1C], other: Series[Never]) -> Series: ...
@overload
def __sub__(
self: Series[Timestamp],
other: Timedelta | TimedeltaSeries | TimedeltaIndex | np.timedelta64,
) -> TimestampSeries: ...
self: Series[int], other: _T_COMPLEX | Sequence[_T_COMPLEX] | Series[_T_COMPLEX]
) -> Series[_T_COMPLEX]: ...
@overload
def __sub__(self: Series[int], other: np_ndarray_anyint) -> Series[int]: ...
@overload
def __sub__(self: Series[int], other: np_ndarray_float) -> Series[float]: ...
@overload
def __sub__(self: Series[int], other: np_ndarray_complex) -> Series[complex]: ...
@overload
def __sub__(
self: Series[Timedelta],
other: Timedelta | TimedeltaSeries | TimedeltaIndex | np.timedelta64,
) -> TimedeltaSeries: ...
self: Series[float],
other: int | Sequence[int] | np_ndarray_anyint | np_ndarray_float | Series[int],
) -> Series[float]: ...
@overload
def __sub__(
self: Series[float],
other: _T_COMPLEX | Sequence[_T_COMPLEX] | Series[_T_COMPLEX],
) -> Series[_T_COMPLEX]: ...
@overload
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes for __sub__() and __rsub__() and sub() should be in a separate PR like you did for add and div, with the tests like you did there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, we have merged #1311, #1312, #1314 and #1332.

@cmp0xff cmp0xff mentioned this pull request Aug 19, 2025
1 task
@cmp0xff
Copy link
Contributor Author

cmp0xff commented Aug 20, 2025

Before making this PR as "ready for review", I probably can still clean up the sub family, make some simplifications and homogenise __sub__ and sub.

Comment on lines +54 to +57
if sys.version_info >= (3, 11):
check(assert_type(r0 + left, "npt.NDArray[np.str_]"), pd.Series, str)
else:
check(assert_type(r0 + left, Any), pd.Series, str)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird. On my local machine it does not pass. In GitHub CI it has to be like this in order to pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to replicate the behavior locally with a python 3.10 and python 3.11 environment. I added this before the if statement:

    reveal_type(r0 + left)
    reveal_type(r0)
    reveal_type(r0.__add__(left))

With pyright and python 3.10, I get this:

  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "Any"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[int, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "Any"

With pyright and python 3.11, I get this:

  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "ndarray[tuple[Any, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[Any, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "ndarray[tuple[Any, ...], dtype[str_]]"

Note the difference in the second reveal type. In python 3.10, it has for r0 ndarray[tuple[int,...], dtype[str_]] while for python 3.11 it has ndarray[tuple[Any,...,dtype[str_]]

Anyhow, can you put a comment above the if statement that indicates why you did this (you could link to this comment).

I think this is because python 3.10 is using numpy 2.2.6 (the last numpy release that supports python 3.10), while python 3.11 is using numpy 2.3.2 .

Once pandas 3.0 is released, we're going to drop python 3.10 support, and I'll do that here as well.

@cmp0xff cmp0xff marked this pull request as ready for review August 21, 2025 10:12
@cmp0xff cmp0xff requested a review from Dr-Irv August 21, 2025 10:13
Comment on lines +173 to +174
check(assert_type(left_ts.rsub(s), pd.Series), pd.Series, pd.Timedelta)
check(assert_type(left_ts.rsub(a), pd.Series), pd.Series, pd.Timedelta)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime - Series[Any] can either be timedelta-like or datetime-like, depending on Any. I would not give an exact type here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file in this PR?

) -> TimestampProperties: ...
@overload
def __get__(
self, instance: Series[Timedelta], owner: Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TimedeltaSeries here or TimedeltaSeries | Series[Timedelta] and then change in the PR that will remove TimedeltaSeries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this test and the other tests related to Series[str], as well as any changes related to making these tests work, in a separate PR?

I'll comment on the test for now, but want to keep the PRs more focused.

Comment on lines +15 to +17
"""Testpd.Series[str]+ Python native str"""
r0 = "right"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests that check that things like left + 5 is caught by the type checker as invalid? (same here and in the other test funcs)

Comment on lines +149 to +153
check(assert_type(s - left_ts, pd.Series), pd.Series, pd.Timedelta)

check(assert_type(left_ts.sub(s), "TimedeltaSeries"), pd.Series, pd.Timedelta)

check(assert_type(left_ts.rsub(s), "TimedeltaSeries"), pd.Series, pd.Timedelta)
check(assert_type(left_ts.rsub(s), pd.Series), pd.Series, pd.Timedelta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this change, but I think it would be worth having a test that is s - left_td where left_td is a series of Timedelta. That illustrates why we can't infer the subtype of the Series.

Comment on lines 2247 to +2266
def __mul__(
self, other: timedelta | Timedelta | TimedeltaSeries | np.timedelta64
self: Series[bool],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[bool], other: Series[Timedelta]) -> Series[Timedelta]: ... # type: ignore[overload-overlap]
@overload
def __mul__(
self: Series[int],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[int], other: Series[Timedelta]) -> Series[Timedelta]: ...
@overload
def __mul__(
self: Series[float],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[float], other: Series[Timedelta]) -> Series[Timedelta]: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to combine these overloads like this?

 @overload
    def __mul__(
        self: Series[bool] | Series[int] | Series[float],
        other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
    ) -> TimedeltaSeries: ...
    @overload
    def __mul__(self: Series[bool]  | Series[int] | Series[float],, other: Series[Timedelta]) -> Series[Timedelta]: ...  # type: ignore[overload-overlap]

Comment on lines +3216 to +3221
timedelta
| np.timedelta64
| np_ndarray_td
| TimedeltaIndex
| Series[Timedelta]
| TimedeltaSeries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add bool | int | float here (and in truediv()) and add appropriate tests. (maybe in the TimedeltaSeries removal PR)

def median(
self,
self: Series[float],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self: Series[float],
self: Series[float] | Series[int]

And then for Series[bool], median() returns np.floating

Comment on lines +3975 to +3982
@overload
def to_numpy(
self,
dtype: DTypeLike | None = None,
copy: bool = False,
na_value: Scalar = ...,
**kwargs,
) -> np_1darray: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this overload because IndexOpsMixin has it. Then you can get rid of the ignores in _SeriesSubClassBase

Comment on lines +54 to +57
if sys.version_info >= (3, 11):
check(assert_type(r0 + left, "npt.NDArray[np.str_]"), pd.Series, str)
else:
check(assert_type(r0 + left, Any), pd.Series, str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to replicate the behavior locally with a python 3.10 and python 3.11 environment. I added this before the if statement:

    reveal_type(r0 + left)
    reveal_type(r0)
    reveal_type(r0.__add__(left))

With pyright and python 3.10, I get this:

  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "Any"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[int, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "Any"

With pyright and python 3.11, I get this:

  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "ndarray[tuple[Any, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[Any, ...], dtype[str_]]"
  c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "ndarray[tuple[Any, ...], dtype[str_]]"

Note the difference in the second reveal type. In python 3.10, it has for r0 ndarray[tuple[int,...], dtype[str_]] while for python 3.11 it has ndarray[tuple[Any,...,dtype[str_]]

Anyhow, can you put a comment above the if statement that indicates why you did this (you could link to this comment).

I think this is because python 3.10 is using numpy 2.2.6 (the last numpy release that supports python 3.10), while python 3.11 is using numpy 2.3.2 .

Once pandas 3.0 is released, we're going to drop python 3.10 support, and I'll do that here as well.

@cmp0xff cmp0xff marked this pull request as draft August 21, 2025 19:43
@cmp0xff cmp0xff mentioned this pull request Aug 21, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants