-
Notifications
You must be signed in to change notification settings - Fork 113
Update to use pandas v2.* #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
7b850ca
updates for pandas 2.2
jpn-- 002604d
pytables 3.9
jpn-- 8819b8c
input checker message failbacks
jpn-- be5c024
fix veh type categoricals
jpn-- 98bc2e4
restore original pandas read_csv NaNs
jpn-- 5beffda
is_monotonic_increasing
jpn-- 9b67fec
fix disagg acc sorting
jpn-- 234a420
drop unused indexes
jpn-- 58003ed
update pipeline ref
jpn-- 012e92e
temporarily disable sharrow in vehicle alloc
jpn-- c6975a4
fix dtype problem
jpn-- 2a899e5
ensure MAX index does not overflow
jpn-- a752ea4
sort on join to preserve index ordering from old pandas
jpn-- 543b19a
local compute test simplifies debugging
jpn-- 8ed8fb9
Merge branch 'main' into depend-pandas-2
jpn-- 50c9f6d
more robust conversion to pyarrow
jpn-- a393dbd
Merge branch 'main' into depend-pandas-2
jpn-- c06d737
Merge branch 'main' into depend-pandas-2
jpn-- cd38e57
Merge branch 'main' into pandas-2
jpn-- 4f89ef6
rewrite df.eval to fast_eval
jpn-- 59872fc
change xarray pin
jpn-- 5191684
fix zarr pin
jpn-- 8091bd5
update numpy and dask pins
jpn-- cf9fb21
wrap raw fast_eval in pd.Series
jpn-- 7becbca
don't skip sharrow in veh alloc
jpn-- 8be8e0d
rebuild ref pipeline
jpn-- 804e780
Merge commit 'c59dc4cdf66e3f53816b00ca28fdbc2ca4fd0c8a' into pandas-2
jpn-- b019c4b
make fast_eval more robust
jpn-- 501e249
revise external targets
jpn-- a0b3c27
prefer public API
jpn-- 560db6b
Merge branch 'main' into pandas-2
jpn-- 61a97b1
Update activitysim-dev-base.yml
jpn-- 4b4906e
add note about why fast_eval exists and how to undo it
jpn-- d780be6
Merge branch 'main' into pandas-2
jpn-- File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
from __future__ import annotations | ||
|
||
from typing import TYPE_CHECKING, Any | ||
|
||
import pandas as pd | ||
from pandas import eval as _eval | ||
|
||
if TYPE_CHECKING: | ||
from collections.abc import Hashable, Iterator, Mapping, Sequence | ||
|
||
from pandas._typing import ArrayLike | ||
|
||
|
||
def _get_cleaned_column_resolvers( | ||
df: pd.DataFrame, raw: bool = True | ||
) -> dict[Hashable, ArrayLike | pd.Series]: | ||
""" | ||
Return the special character free column resolvers of a dataframe. | ||
|
||
Column names with special characters are 'cleaned up' so that they can | ||
be referred to by backtick quoting. | ||
Used in :meth:`DataFrame.eval`. | ||
""" | ||
from pandas import Series | ||
from pandas.core.computation.parsing import clean_column_name | ||
|
||
if isinstance(df, pd.Series): | ||
return {clean_column_name(df.name): df} | ||
|
||
# CHANGED FROM PANDAS: do not even convert the arrays to pd.Series, just | ||
# give the raw arrays to the compute engine. This is potentially a breaking | ||
# change if any of the operations in the eval string require a pd.Series. | ||
jpn-- marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if raw: | ||
# Performance tradeoff: in the dict below, we iterate over `df.items`, | ||
# which yields tuples of (column_name, data as pd.Series). This is marginally | ||
# slower than iterating over `df.columns` and `df._iter_column_arrays()`, | ||
# but the latter is not in Pandas' public API, and may be removed in the future. | ||
return { | ||
clean_column_name(k): v for k, v in df.items() if not isinstance(k, int) | ||
} | ||
|
||
# CHANGED FROM PANDAS: do not call df.dtype inside the dict comprehension loop | ||
# This update has been made in https://github.com/pandas-dev/pandas/pull/59573, | ||
# but appears not to have been released yet as of pandas 2.2.3 | ||
dtypes = df.dtypes | ||
|
||
return { | ||
clean_column_name(k): Series( | ||
v, copy=False, index=df.index, name=k, dtype=dtypes[k] | ||
).__finalize__(df) | ||
for k, v in zip(df.columns, df._iter_column_arrays()) | ||
jpn-- marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if not isinstance(k, int) | ||
} | ||
|
||
|
||
def fast_eval(df: pd.DataFrame, expr: str, **kwargs) -> Any | None: | ||
""" | ||
Evaluate a string describing operations on DataFrame columns. | ||
|
||
Operates on columns only, not specific rows or elements. This allows | ||
`eval` to run arbitrary code, which can make you vulnerable to code | ||
injection if you pass user input to this function. | ||
|
||
This function is a wrapper that replaces :meth:`~pandas.DataFrame.eval` | ||
with a more efficient version than in the default pandas library (as | ||
of pandas 2.2.3). It is recommended to use this function instead of | ||
:meth:`~pandas.DataFrame.eval` for better performance. However, if you | ||
encounter issues with this function, you can switch back to the default | ||
pandas eval by changing the function call from `fast_eval(df, ...)` to | ||
`df.eval(...)`. | ||
|
||
Parameters | ||
---------- | ||
expr : str | ||
The expression string to evaluate. | ||
**kwargs | ||
See the documentation for :meth:`~pandas.DataFrame.eval` for complete | ||
details on the keyword arguments accepted. | ||
|
||
Returns | ||
------- | ||
ndarray, scalar, or pandas object | ||
The result of the evaluation. | ||
""" | ||
|
||
inplace = False | ||
kwargs["level"] = kwargs.pop("level", 0) + 1 | ||
index_resolvers = df._get_index_resolvers() | ||
column_resolvers = _get_cleaned_column_resolvers(df) | ||
resolvers = column_resolvers, index_resolvers | ||
if "target" not in kwargs: | ||
kwargs["target"] = df | ||
kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers | ||
|
||
try: | ||
return pd.Series( | ||
_eval(expr, inplace=inplace, **kwargs), index=df.index, name=expr | ||
).__finalize__(df) | ||
except Exception as e: | ||
# Initially assume that the exception is caused by the potentially | ||
# breaking change in _get_cleaned_column_resolvers, and try again | ||
# TODO: what kind of exception should be caught here so it is less broad | ||
column_resolvers = _get_cleaned_column_resolvers(df, raw=False) | ||
resolvers = column_resolvers, index_resolvers | ||
kwargs["resolvers"] = kwargs["resolvers"][:-2] + resolvers | ||
return _eval(expr, inplace=inplace, **kwargs) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.