-
-
Notifications
You must be signed in to change notification settings - Fork 147
refactor: #718 only drop TimestampSeries #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c81cd6e
to
d5e1089
Compare
d5e1089
to
41c7015
Compare
abf9147
to
cbbd372
Compare
@cmp0xff you have a number of PRs submitted while I was out on vacation for 2 weeks. Can you let me know which ones I should prioritize for review? |
Hi @Dr-Irv, I hope you had a nice vacation. My pull requests are categorised below. Each category is independent, but those in a higher position have a slightly higher priority in my opinion.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. It's a lot of good work.
Main thing - if I'm going to merge this PR, it needs to be in a state where we don't need the followup PR.
Basic rule - we don't put ignore
in the tests unless we are testing that the stubs should not accept something that is invalid. You have places where you have added ignore
in the tests and I won't merge that in (unless we know it is a bug in the type checker)
tests/test_frame.py
Outdated
check(assert_type(s + summer, pd.Series), pd.Series) # type: ignore[assert-type] | ||
check(assert_type(s + df["y"], pd.Series), pd.Series) # type: ignore[assert-type] | ||
check(assert_type(summer + summer, pd.Series), pd.Series) # type: ignore[assert-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't want to have ignore
in the tests. Fix the types to make this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not there anymore.
Thank you very much for your quick and thorough reviews. I will be able to work on them next week. |
cbbd372
to
ed69ec5
Compare
b095af2
to
f1cf19f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty close. There are 2 main issues reflected in the comments.
- I think the changes you made for
__sub__()
and possibly__mul__()
and related methods to handle int, float, etc., should be a separate PR, similar to what you did for_add__()
and__div__()
. If we can get that working, and then do this one, it will be easier to make sure all the tests are still working right. - I don't want to merge with any new
ignore
in the tests, because if someone pullsmain
, they will get something we know is broken. But I suggest a plan for handling that in the comments.
Thanks again for the great work on this.
tests/series/test_series.py
Outdated
# Will be fixed after removing TimedeltaSeries, see Series.__sub__ in series.pyi | ||
check(assert_type(ss, pd.Series), pd.Series) # type: ignore[assert-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm don't want to merge until we deal with this, because everything in main
should pass all the tests. But here is what I'd like to do. Once I'm OK with this PR, I won't merge it, and then the TimedeltaSeries
PR can merge into this branch where the line above should get fixed.
I could put this branch into the repo once approved so it becomes the new PR target of the next PR, and then once that is approved and merged, we do a new PR from that branch to main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise, there are a few cases where the return type is Series[Timedelta]
, instead of the to-be-removed TimedeltaSeries
. When I put any of them as TimedeltaSeries
, ss
cannot be properly recognised.
# is invoked, but because of how Series.dt is hooked in and that we may not know the | ||
# type of the series, we don't know which kind of series was ...ed | ||
# in to the dt accessor | ||
|
||
_DTTimestampTimedeltaReturnType = TypeVar( | ||
"_DTTimestampTimedeltaReturnType", | ||
bound=Series | TimestampSeries | TimedeltaSeries | DatetimeIndex | TimedeltaIndex, | ||
"_DTTimestampTimedeltaReturnType", bound=Series | DatetimeIndex | TimedeltaIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear why TimedeltaSeries
is removed at this point. Or shouldn't it be bound=Series | Series[Timestamp] | TimedeltaSeries | DatetimeIndex | TimedeltaIndex
?
I know that Series
includes the other 2, but I'd like to keep this as close as possible to what was there before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas-stubs/core/series.pyi
Outdated
@overload | ||
def __sub__(self: Series[S1C], other: Series[Never]) -> Series: ... | ||
@overload | ||
def __sub__( | ||
self: Series[Timestamp], | ||
other: Timedelta | TimedeltaSeries | TimedeltaIndex | np.timedelta64, | ||
) -> TimestampSeries: ... | ||
self: Series[int], other: _T_COMPLEX | Sequence[_T_COMPLEX] | Series[_T_COMPLEX] | ||
) -> Series[_T_COMPLEX]: ... | ||
@overload | ||
def __sub__(self: Series[int], other: np_ndarray_anyint) -> Series[int]: ... | ||
@overload | ||
def __sub__(self: Series[int], other: np_ndarray_float) -> Series[float]: ... | ||
@overload | ||
def __sub__(self: Series[int], other: np_ndarray_complex) -> Series[complex]: ... | ||
@overload | ||
def __sub__( | ||
self: Series[Timedelta], | ||
other: Timedelta | TimedeltaSeries | TimedeltaIndex | np.timedelta64, | ||
) -> TimedeltaSeries: ... | ||
self: Series[float], | ||
other: int | Sequence[int] | np_ndarray_anyint | np_ndarray_float | Series[int], | ||
) -> Series[float]: ... | ||
@overload | ||
def __sub__( | ||
self: Series[float], | ||
other: _T_COMPLEX | Sequence[_T_COMPLEX] | Series[_T_COMPLEX], | ||
) -> Series[_T_COMPLEX]: ... | ||
@overload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes for __sub__()
and __rsub__()
and sub()
should be in a separate PR like you did for add
and div
, with the tests like you did there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before making this PR as "ready for review", I probably can still clean up the |
if sys.version_info >= (3, 11): | ||
check(assert_type(r0 + left, "npt.NDArray[np.str_]"), pd.Series, str) | ||
else: | ||
check(assert_type(r0 + left, Any), pd.Series, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird. On my local machine it does not pass. In GitHub CI it has to be like this in order to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to replicate the behavior locally with a python 3.10 and python 3.11 environment. I added this before the if
statement:
reveal_type(r0 + left)
reveal_type(r0)
reveal_type(r0.__add__(left))
With pyright
and python 3.10, I get this:
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "Any"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[int, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "Any"
With pyright
and python 3.11, I get this:
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "ndarray[tuple[Any, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[Any, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "ndarray[tuple[Any, ...], dtype[str_]]"
Note the difference in the second reveal type. In python 3.10, it has for r0
ndarray[tuple[int,...], dtype[str_]]
while for python 3.11 it has ndarray[tuple[Any,...,dtype[str_]]
Anyhow, can you put a comment above the if
statement that indicates why you did this (you could link to this comment).
I think this is because python 3.10 is using numpy 2.2.6 (the last numpy release that supports python 3.10), while python 3.11 is using numpy 2.3.2 .
Once pandas 3.0 is released, we're going to drop python 3.10 support, and I'll do that here as well.
check(assert_type(left_ts.rsub(s), pd.Series), pd.Series, pd.Timedelta) | ||
check(assert_type(left_ts.rsub(a), pd.Series), pd.Series, pd.Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datetime - Series[Any]
can either be timedelta
-like or datetime
-like, depending on Any
. I would not give an exact type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this file in this PR?
) -> TimestampProperties: ... | ||
@overload | ||
def __get__( | ||
self, instance: Series[Timedelta], owner: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TimedeltaSeries
here or TimedeltaSeries | Series[Timedelta]
and then change in the PR that will remove TimedeltaSeries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this test and the other tests related to Series[str]
, as well as any changes related to making these tests work, in a separate PR?
I'll comment on the test for now, but want to keep the PRs more focused.
"""Testpd.Series[str]+ Python native str""" | ||
r0 = "right" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests that check that things like left + 5
is caught by the type checker as invalid? (same here and in the other test funcs)
check(assert_type(s - left_ts, pd.Series), pd.Series, pd.Timedelta) | ||
|
||
check(assert_type(left_ts.sub(s), "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
|
||
check(assert_type(left_ts.rsub(s), "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
check(assert_type(left_ts.rsub(s), pd.Series), pd.Series, pd.Timedelta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this change, but I think it would be worth having a test that is s - left_td
where left_td
is a series of Timedelta
. That illustrates why we can't infer the subtype of the Series
.
def __mul__( | ||
self, other: timedelta | Timedelta | TimedeltaSeries | np.timedelta64 | ||
self: Series[bool], | ||
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries, | ||
) -> TimedeltaSeries: ... | ||
@overload | ||
def __mul__(self: Series[bool], other: Series[Timedelta]) -> Series[Timedelta]: ... # type: ignore[overload-overlap] | ||
@overload | ||
def __mul__( | ||
self: Series[int], | ||
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries, | ||
) -> TimedeltaSeries: ... | ||
@overload | ||
def __mul__(self: Series[int], other: Series[Timedelta]) -> Series[Timedelta]: ... | ||
@overload | ||
def __mul__( | ||
self: Series[float], | ||
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries, | ||
) -> TimedeltaSeries: ... | ||
@overload | ||
def __mul__(self: Series[float], other: Series[Timedelta]) -> Series[Timedelta]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to combine these overloads like this?
@overload
def __mul__(
self: Series[bool] | Series[int] | Series[float],
other: timedelta | np.timedelta64 | np_ndarray_td | TimedeltaSeries,
) -> TimedeltaSeries: ...
@overload
def __mul__(self: Series[bool] | Series[int] | Series[float],, other: Series[Timedelta]) -> Series[Timedelta]: ... # type: ignore[overload-overlap]
timedelta | ||
| np.timedelta64 | ||
| np_ndarray_td | ||
| TimedeltaIndex | ||
| Series[Timedelta] | ||
| TimedeltaSeries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add bool | int | float
here (and in truediv()
) and add appropriate tests. (maybe in the TimedeltaSeries
removal PR)
def median( | ||
self, | ||
self: Series[float], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self: Series[float], | |
self: Series[float] | Series[int] |
And then for Series[bool]
, median()
returns np.floating
@overload | ||
def to_numpy( | ||
self, | ||
dtype: DTypeLike | None = None, | ||
copy: bool = False, | ||
na_value: Scalar = ..., | ||
**kwargs, | ||
) -> np_1darray: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this overload because IndexOpsMixin
has it. Then you can get rid of the ignore
s in _SeriesSubClassBase
if sys.version_info >= (3, 11): | ||
check(assert_type(r0 + left, "npt.NDArray[np.str_]"), pd.Series, str) | ||
else: | ||
check(assert_type(r0 + left, Any), pd.Series, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to replicate the behavior locally with a python 3.10 and python 3.11 environment. I added this before the if
statement:
reveal_type(r0 + left)
reveal_type(r0)
reveal_type(r0.__add__(left))
With pyright
and python 3.10, I get this:
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "Any"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[int, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "Any"
With pyright
and python 3.11, I get this:
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:54:17 - information: Type of "r0 + left" is "ndarray[tuple[Any, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:55:17 - information: Type of "r0" is "ndarray[tuple[Any, ...], dtype[str_]]"
c:\Code\pandas-stubs\tests\series\arithmetic\str\test_add.py:56:17 - information: Type of "r0.__add__(left)" is "ndarray[tuple[Any, ...], dtype[str_]]"
Note the difference in the second reveal type. In python 3.10, it has for r0
ndarray[tuple[int,...], dtype[str_]]
while for python 3.11 it has ndarray[tuple[Any,...,dtype[str_]]
Anyhow, can you put a comment above the if
statement that indicates why you did this (you could link to this comment).
I think this is because python 3.10 is using numpy 2.2.6 (the last numpy release that supports python 3.10), while python 3.11 is using numpy 2.3.2 .
Once pandas 3.0 is released, we're going to drop python 3.10 support, and I'll do that here as well.
TimestampSeries
,TimedeltaSeries
, etc. can be removed #718assert_type()
to assert the type of any return value