Expression profiling (legacy mode) #936

jpn-- · 2025-04-20T20:51:08Z

Addresses #906. This pull request introduces a new performance profiling feature for expression evaluation in the ActivitySim framework. The feature allows developers to track and log the runtime of individual expressions, providing insights into potential bottlenecks in complex models. Key changes include the integration of a performance timer, updates to various core functions to support profiling, and new configuration settings for controlling profiling behavior.

Performance Profiling Feature

Integration of Performance Timer:
- Added the EvalTiming class to measure and log the execution time of expressions. The timer is initialized conditionally based on the new expression_profile setting. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]
Expression Timing Wrappers:
- Wrapped key expression evaluations with performance_timer.time_expression to measure the execution time of individual expressions. (activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3]
Log Writing:
- Added functionality to write performance logs to a file after the evaluation process is complete. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]

Configuration Updates

New Profiling Settings:
- Introduced expression_profile and expression_profile_cutoff settings to enable/disable profiling globally or for specific components, and to filter out expressions based on runtime thresholds. (activitysim/core/configuration/top.py)
Documentation for Profiling:
- Added docstrings to explain the purpose and usage of the new profiling settings and performance log file. (activitysim/core/configuration/base.py, activitysim/core/configuration/top.py) [1] [2]

Core Function Enhancements

Trace Label Support:
- Extended several core functions to accept a trace_label parameter for more granular logging and profiling. (activitysim/core/simulate.py) [1] [2]
Performance Reporting:
- Added a summary report generation step to analyze and output the slowest expressions across all models. (activitysim/cli/run.py)

This enhancement is primarily aimed at developers and advanced users who need to optimize model performance. While it introduces some overhead during execution, it provides valuable insights for debugging and improving complex expressions.

Copilot

Pull Request Overview

This PR implements a new performance profiling feature that enables detailed tracking and logging of expression evaluation runtimes in ActivitySim.

Integrated a performance timer via the EvalTiming class.
Extended core functions to wrap expression evaluations with performance measurement.
Added new configuration settings and documentation to support and explain expression profiling.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
docs/users-guide/performance/index.md	Added a link to the new Expression Profiling section.
docs/users-guide/performance/expr-profiling.md	Created documentation explaining expression profiling.
activitiesim/core/simulate.py	Integrated performance timing in expression evaluation.
activitiesim/core/interaction_simulate.py	Enabled profiling timer for interaction simulations.
activitiesim/core/expressions.py	Passed trace_label parameter to compute_columns.
activitiesim/core/configuration/top.py	Added new profiling settings and descriptive docstrings.
activitiesim/core/configuration/base.py	Added a performance_log configuration field.
activitiesim/core/assign.py	Wrapped several expression evaluations with profiling.
activitiesim/cli/run.py	Generated reporting on profiling summary data.
activitiesim/abm/models/summarize.py	Integrated performance timing in summarizing expressions.

Comments suppressed due to low confidence (1)

activitysim/core/assign.py:250

The variable 'trace_label' is used to create the performance log filename but is not defined in the current scope. Consider updating the function signature to accept a 'trace_label' parameter or use an alternative identifier available in context.

if state.settings.expression_profile:

jpn-- · 2025-04-21T16:31:52Z

Example outputs from the SANDAG test model:

Or: Linked here. Note: if you want to see the results, you'll need to click the first links, or download the HTML files, the content is rendered as embedded JavaScript and the OneDrive preview probably won't show anything.

dhensle

Tried running this on the SEMCOG model, but ran into a crash when trying to create the final summaries:

dhensle · 2025-07-17T17:16:59Z

docs/users-guide/performance/expr-profiling.md

+
+Profiling expressions also adds some overhead to the model run, so users should
+be careful about using the profiler in production runs.  It is recommended turn off
+the profiler in production runs, and only use it for debugging and development.


Do we have a rough estimate for this to add to the docs? For example, just say that for the full SANDAG model, this increased the runtime by XX amount.

activitysim/core/timing.py

activitysim/core/configuration/base.py

stefancoe · 2025-07-24T03:45:19Z

In order to test this, do I need to add any other settings beyond:
expression_profile: True

Thanks!

stefancoe · 2025-07-24T15:46:40Z

Ignore my question above, I figured out why I was not seeing any outputs.

bwentl · 2025-07-24T17:50:26Z

@jpn--, I ran into the following error when testing this branch in mc logsum in school location:

Our model has been tested to run on asim v1.3.4 and v1.4.0.

Any suggestions? Let me know if you need anything from me to diagnose this issue.

stefancoe · 2025-07-24T18:03:06Z

I ran into the same issue that @dhensle describes above.

bwentl · 2025-07-24T18:53:06Z

@jpn-- , thanks for the suggestion at our meeting today. It turns out my previous error was indeed due to file path being too long. I moved my working folder to C:/model folder and it finished successfully.

A solution to the path issue, and could also make the expression profiling more user friendly is to move all the logs into one csv file. That way, we can filter, sort and review what expression in what model are taking the most time. Similar to the memory_profile.csv.

One potential issue with exporting the timing and expression to csv is that many expressions may contain commas, thus causing potentially messy formatting for csv. We could clean up all the commas in the expressions to other characters like _ or . something similar as a workaround. To avoid confusion as to which line and expression the timing is for, we could also export row number and source model in addition to the actual expression.

Here is an example output to illustrate what the csv file could look like:

time_µsec,component,row_number,expression
0,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,1,trips.tour_id.map(tours.number_of_participants)
21463,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,2,od_skims['DIST']
4999,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,3,np.where((reindex(persons.income_ df.person_id)>=150000)_5_reindex(persons.income_segment_ df.person_id))
...

jpn-- · 2025-07-25T22:14:50Z

@stefancoe @dhensle I have found and fixed the problem you observed, which turns out was a Windows thing. I added a GitHub actions test that runs the expression profiling on a windows server to ensure it's fixed.

jpn-- · 2025-07-25T22:19:01Z

@bwentl I like your idea, although it's not a very simple change to implement. Due to potential preprocessing of spec files to remove zero-result rows, we cannot infer the SPEC file row number from the expressions available at the point the profiling occurs. I think it's more reliable to just keep strings exactly as they are evaluated. I'll open an issue to consider this for the future. In the meantime I added a warning message, if a log file write fails it will suggest the user check on file path length.

stefancoe · 2025-07-31T19:34:28Z

@jpn-- On the call I reported that I was still seeing the issue that @dhensle posted above. I wondered if it had to do with using the same output directory as my previous tests so I deleted all the files in the output directory and re-ran. This solved the issue. So it was the old files, not encoded as UTF-8? that must have been my issue.

dhensle · 2025-07-31T23:37:40Z

@jpn-- @stefancoe I can confirm the same behavior -- it crashed with the same error if the old files existed, but it works with a clean output folder.

However, this test does uncover an issue that the log files from previous runs are still in the expr-performance output folder and new logs are added with an appended "-1" to the file name. This results in the output timing statements actually counting double if a run is already in the output folder!

I think the functionality should be updated to delete old timing files before writing new ones.

jpn-- · 2025-08-01T01:29:41Z

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

stefancoe · 2025-08-01T08:03:04Z

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

Sounds good to me. Saves the user from having to generate multiple output directories when testing/comparing different expressions.

dhensle · 2025-08-01T16:30:29Z

How about if I collect results into a timestamped subdirectory of the expression profiling?

Sounds good to me too.

jpn-- · 2025-08-05T01:12:09Z

Should now be working as described above, with a timestamped log file.

dhensle · 2025-08-06T18:57:13Z

I can confirm the timestamped outputs as described above for me. I have no issues with the code changes either. I think the only outstanding issue is the additional runtime question above.

JoeJimFlood · 2025-08-07T16:42:02Z

I was finally able to get around to testing it on my end. It seemed to work just fine, though I did get the screenshotted error at the end. I'm not sure if that's related (and all of the outputs were written). I have attached one of the outputs in case anyone would like to review it to confirm that it looks correct.

jpn-- · 2025-08-07T16:51:04Z

@JoeJimFlood

I will look into that error message, and see if I can fix it.
Runtime on this was 7078 seconds. How long does it take to run an otherwise identical run on the same server, but without the expression timing enabled?

stefancoe · 2025-08-07T17:16:57Z

@jpn-- I can also do a runtime test. My tests so far have been with our small set of inputs, which runs in about 12 minutes, single process. I did do a single process run with a 10 percent sample using the full inputs. I can do this with expression profiling turned off for a comparison. We usually run 100 percent samples with MP, but I imagine that should be off for this test?

I tested the timestamped output folder and it works as expected.

JoeJimFlood · 2025-08-07T19:10:04Z

@jpn-- I just reran the test on the same server using the same environment but setting expression_profile: False

That error didn't happen this time. If I recall correctly I've seen it before within the past month or so. There's something about errors like this one and the one seen in Overflow error when coalescing memory logs #959 that's a bit concerning to me when using the model in application. The crashes stop the entire modeling system despite the fact that all of the outputs from ActivitySim are created.
6283 seconds

I also shared the results with @bhargavasana, and he agreed with me that it would be more useful if the process name were included in the folder name and not just the timestamp for when it was created. It took me a few tries to find the file I shared in my last comment.

jpn-- · 2025-08-07T19:58:14Z

Oh, we should not have all those timestamped directories... that's a bug I will fix. There should only be a single timestamp for the entire multiprocessor run, stamped at the beginning of that run by the primary process. All the subprocesses should write into subdirectories of that one timestamped directory.

As for the error: it's clearly an error in the tool that merges the logs into a pair of html analysis outputs, the component and subcomponent reports. These are two output files in addition to all the other outputs you expect from an ActivitySim run, and I am 99% sure your first attempt did not produce these two final summary files. It didn't crash for the second run because it didn't run the expression profiling summary step where the error lives.

And: users should never have expression profiling enabled in "application" or "production" mode. This is a developer's tool, and your results show us that it costs about 13% more runtime overall. Only developers who will actually look at the performance results and try to tune the expressions to run faster should be using it.

JoeJimFlood · 2025-08-07T22:09:37Z

Ah, I had misread the error message. About a month ago we had an error in application when the pipeline files were being written and I didn't look very closely and assumed it was the same thing.

JoeJimFlood · 2025-08-15T15:52:55Z

@jpn-- Did the naming of the files work as intended?

jpn-- · 2025-08-16T16:05:12Z

@JoeJimFlood Yes, your most recent message looks correct.
@dhensle Since you previously "request changes", I am blocked from merging this until you now approve it.

dhensle · 2025-08-18T17:17:15Z

@jpn-- I think the updated text from Joe's full-scale sandag testing need to be added still.

jpn-- added 7 commits April 18, 2025 22:11

expression profiling

c6e2194

two reports

f1e1971

make simple report sortable

770b336

change default to grid

136a12c

configurable expression_profile_cutoff

642b486

works with MP

877f279

documentation

25564f0

jpn-- requested a review from Copilot April 20, 2025 20:51

Copilot AI reviewed Apr 20, 2025

View reviewed changes

jpn-- mentioned this pull request Apr 20, 2025

Expression Profiling with Sharrow #937

Open

i-am-sijia mentioned this pull request Apr 21, 2025

Update to use pandas v2.* #932

Merged

jpn-- requested a review from dhensle April 24, 2025 16:13

jpn-- added 4 commits May 20, 2025 13:00

Merge branch 'main' into expression-profiling

219541d

Merge branch 'main' into expression-profiling

d9d11ae

Merge branch 'main' into expression-profiling

a7761a1

Merge branch 'main' into expression-profiling

4f8a54a

dhensle requested changes Jul 17, 2025

View reviewed changes

jpn-- requested review from stefancoe and bwentl July 17, 2025 18:16

jpn-- added 5 commits July 21, 2025 14:25

Merge branch 'main' into expression-profiling

ea7914b

expression_profile_style

0dad230

performance_log can be bool

e2b1330

fix for pandas 2

97cc401

test expr profiling

a482807

jpn-- added 3 commits July 25, 2025 16:51

add missing import

6aaed14

add warning about path length in error message

77af2b3

oops, .log not .csv

b8602b0

Merge branch 'main' into expression-profiling

29d535a

jpn-- mentioned this pull request Aug 4, 2025

2025-08-07 Engineering Team (No Meeting) ActivitySim/meeting-notes#10

Closed

jpn-- added 3 commits August 4, 2025 17:28

add timestamping to expression profile logs

1ed0396

Merge branch 'main' into expression-profiling

afa1f64

fix file naming, update tests

e9feab5

jpn-- requested a review from dhensle August 5, 2025 01:11

jpn-- added 2 commits August 13, 2025 10:43

all subprocesses to same timestamp

db7b767

Merge branch 'main' into expression-profiling

f2dd341

Merge branch 'main' into expression-profiling

cd7eaac

Expression profiling (legacy mode) #936

Are you sure you want to change the base?

Expression profiling (legacy mode) #936

Uh oh!

Conversation

jpn-- commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Profiling Feature

Configuration Updates

Core Function Enhancements

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

jpn-- commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhensle left a comment

Choose a reason for hiding this comment

Uh oh!

dhensle Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stefancoe commented Jul 24, 2025

Uh oh!

stefancoe commented Jul 24, 2025

Uh oh!

bwentl commented Jul 24, 2025

Uh oh!

stefancoe commented Jul 24, 2025

Uh oh!

bwentl commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpn-- commented Jul 25, 2025

Uh oh!

jpn-- commented Jul 25, 2025

Uh oh!

stefancoe commented Jul 31, 2025

Uh oh!

dhensle commented Jul 31, 2025

Uh oh!

jpn-- commented Aug 1, 2025

Uh oh!

stefancoe commented Aug 1, 2025

Uh oh!

dhensle commented Aug 1, 2025

Uh oh!

jpn-- commented Aug 5, 2025

Uh oh!

dhensle commented Aug 6, 2025

Uh oh!

JoeJimFlood commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpn-- commented Aug 7, 2025

Uh oh!

stefancoe commented Aug 7, 2025

Uh oh!

JoeJimFlood commented Aug 7, 2025

Uh oh!

jpn-- commented Aug 7, 2025

Uh oh!

JoeJimFlood commented Aug 7, 2025

Uh oh!

JoeJimFlood commented Aug 15, 2025

Uh oh!

jpn-- commented Aug 16, 2025

Uh oh!

dhensle commented Aug 18, 2025

Uh oh!

Uh oh!

jpn-- commented Apr 20, 2025 •

edited

Loading

jpn-- commented Apr 21, 2025 •

edited

Loading

bwentl commented Jul 24, 2025 •

edited

Loading

JoeJimFlood commented Aug 7, 2025 •

edited

Loading