Skip to content

Expression profiling (legacy mode) #936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

jpn--
Copy link
Member

@jpn-- jpn-- commented Apr 20, 2025

Addresses #906. This pull request introduces a new performance profiling feature for expression evaluation in the ActivitySim framework. The feature allows developers to track and log the runtime of individual expressions, providing insights into potential bottlenecks in complex models. Key changes include the integration of a performance timer, updates to various core functions to support profiling, and new configuration settings for controlling profiling behavior.

Performance Profiling Feature

  • Integration of Performance Timer:

    • Added the EvalTiming class to measure and log the execution time of expressions. The timer is initialized conditionally based on the new expression_profile setting. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]
  • Expression Timing Wrappers:

    • Wrapped key expression evaluations with performance_timer.time_expression to measure the execution time of individual expressions. (activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3]
  • Log Writing:

    • Added functionality to write performance logs to a file after the evaluation process is complete. (activitysim/abm/models/summarize.py, activitysim/core/assign.py, activitysim/core/interaction_simulate.py, activitysim/core/simulate.py) [1] [2] [3] [4]

Configuration Updates

  • New Profiling Settings:

    • Introduced expression_profile and expression_profile_cutoff settings to enable/disable profiling globally or for specific components, and to filter out expressions based on runtime thresholds. (activitysim/core/configuration/top.py)
  • Documentation for Profiling:

    • Added docstrings to explain the purpose and usage of the new profiling settings and performance log file. (activitysim/core/configuration/base.py, activitysim/core/configuration/top.py) [1] [2]

Core Function Enhancements

  • Trace Label Support:

    • Extended several core functions to accept a trace_label parameter for more granular logging and profiling. (activitysim/core/simulate.py) [1] [2]
  • Performance Reporting:

    • Added a summary report generation step to analyze and output the slowest expressions across all models. (activitysim/cli/run.py)

This enhancement is primarily aimed at developers and advanced users who need to optimize model performance. While it introduces some overhead during execution, it provides valuable insights for debugging and improving complex expressions.

@jpn-- jpn-- requested a review from Copilot April 20, 2025 20:51
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new performance profiling feature that enables detailed tracking and logging of expression evaluation runtimes in ActivitySim.

  • Integrated a performance timer via the EvalTiming class.
  • Extended core functions to wrap expression evaluations with performance measurement.
  • Added new configuration settings and documentation to support and explain expression profiling.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/users-guide/performance/index.md Added a link to the new Expression Profiling section.
docs/users-guide/performance/expr-profiling.md Created documentation explaining expression profiling.
activitiesim/core/simulate.py Integrated performance timing in expression evaluation.
activitiesim/core/interaction_simulate.py Enabled profiling timer for interaction simulations.
activitiesim/core/expressions.py Passed trace_label parameter to compute_columns.
activitiesim/core/configuration/top.py Added new profiling settings and descriptive docstrings.
activitiesim/core/configuration/base.py Added a performance_log configuration field.
activitiesim/core/assign.py Wrapped several expression evaluations with profiling.
activitiesim/cli/run.py Generated reporting on profiling summary data.
activitiesim/abm/models/summarize.py Integrated performance timing in summarizing expressions.
Comments suppressed due to low confidence (1)

activitysim/core/assign.py:250

  • The variable 'trace_label' is used to create the performance log filename but is not defined in the current scope. Consider updating the function signature to accept a 'trace_label' parameter or use an alternative identifier available in context.
if state.settings.expression_profile:

@jpn--
Copy link
Member Author

jpn-- commented Apr 21, 2025

Example outputs from the SANDAG test model:

Or: Linked here. Note: if you want to see the results, you'll need to click the first links, or download the HTML files, the content is rendered as embedded JavaScript and the OneDrive preview probably won't show anything.

@jpn-- jpn-- requested a review from dhensle April 24, 2025 16:13
Copy link
Contributor

@dhensle dhensle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried running this on the SEMCOG model, but ran into a crash when trying to create the final summaries:

image


Profiling expressions also adds some overhead to the model run, so users should
be careful about using the profiler in production runs. It is recommended turn off
the profiler in production runs, and only use it for debugging and development.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a rough estimate for this to add to the docs? For example, just say that for the full SANDAG model, this increased the runtime by XX amount.

@jpn-- jpn-- requested review from stefancoe and bwentl July 17, 2025 18:16
@stefancoe
Copy link
Contributor

In order to test this, do I need to add any other settings beyond:
expression_profile: True

Thanks!

@stefancoe
Copy link
Contributor

Ignore my question above, I figured out why I was not seeing any outputs.

@bwentl
Copy link
Contributor

bwentl commented Jul 24, 2025

@jpn--, I ran into the following error when testing this branch in mc logsum in school location:
image

Our model has been tested to run on asim v1.3.4 and v1.4.0.

Any suggestions? Let me know if you need anything from me to diagnose this issue.

@stefancoe
Copy link
Contributor

I ran into the same issue that @dhensle describes above.

@bwentl
Copy link
Contributor

bwentl commented Jul 24, 2025

@jpn-- , thanks for the suggestion at our meeting today. It turns out my previous error was indeed due to file path being too long. I moved my working folder to C:/model folder and it finished successfully.

A solution to the path issue, and could also make the expression profiling more user friendly is to move all the logs into one csv file. That way, we can filter, sort and review what expression in what model are taking the most time. Similar to the memory_profile.csv.

One potential issue with exporting the timing and expression to csv is that many expressions may contain commas, thus causing potentially messy formatting for csv. We could clean up all the commas in the expressions to other characters like _ or . something similar as a workaround. To avoid confusion as to which line and expression the timing is for, we could also export row number and source model in addition to the actual expression.

Here is an example output to illustrate what the csv file could look like:

time_µsec,component,row_number,expression
0,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,1,trips.tour_id.map(tours.number_of_participants)
21463,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,2,od_skims['DIST']
4999,trip_matrices.preprocessor.write_trip_matrices_annotate_trips_preprocessor,3,np.where((reindex(persons.income_ df.person_id)>=150000)_5_reindex(persons.income_segment_ df.person_id))
...

@jpn--
Copy link
Member Author

jpn-- commented Jul 25, 2025

@stefancoe @dhensle I have found and fixed the problem you observed, which turns out was a Windows thing. I added a GitHub actions test that runs the expression profiling on a windows server to ensure it's fixed.

@jpn--
Copy link
Member Author

jpn-- commented Jul 25, 2025

@bwentl I like your idea, although it's not a very simple change to implement. Due to potential preprocessing of spec files to remove zero-result rows, we cannot infer the SPEC file row number from the expressions available at the point the profiling occurs. I think it's more reliable to just keep strings exactly as they are evaluated. I'll open an issue to consider this for the future. In the meantime I added a warning message, if a log file write fails it will suggest the user check on file path length.

@stefancoe
Copy link
Contributor

@jpn-- On the call I reported that I was still seeing the issue that @dhensle posted above. I wondered if it had to do with using the same output directory as my previous tests so I deleted all the files in the output directory and re-ran. This solved the issue. So it was the old files, not encoded as UTF-8? that must have been my issue.

@dhensle
Copy link
Contributor

dhensle commented Jul 31, 2025

@jpn-- @stefancoe I can confirm the same behavior -- it crashed with the same error if the old files existed, but it works with a clean output folder.

However, this test does uncover an issue that the log files from previous runs are still in the expr-performance output folder and new logs are added with an appended "-1" to the file name. This results in the output timing statements actually counting double if a run is already in the output folder!

I think the functionality should be updated to delete old timing files before writing new ones.

@jpn--
Copy link
Member Author

jpn-- commented Aug 1, 2025

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

@stefancoe
Copy link
Contributor

I think the functionality should be updated to delete old timing files before writing new ones.

I imagine users potentially want to keep older expression timing results around to compare before-and-after when applying some treatment to change the results, so I am hesitant to just wipe them automatically. How about if I collect results into a timestamped subdirectory of the expression profiling? So instead of writing to ./log/expr-performance/component.whatever.log we would write to ./log/expr-performance/20250731-2359/component.whatever.log. The end of run summaries would then also go into ./log/expr-performance/20250731-2359 and not double count or overwrite anything.

Sounds good to me. Saves the user from having to generate multiple output directories when testing/comparing different expressions.

@dhensle
Copy link
Contributor

dhensle commented Aug 1, 2025

How about if I collect results into a timestamped subdirectory of the expression profiling?

Sounds good to me too.

@jpn-- jpn-- requested a review from dhensle August 5, 2025 01:11
@jpn--
Copy link
Member Author

jpn-- commented Aug 5, 2025

Should now be working as described above, with a timestamped log file.

@dhensle
Copy link
Contributor

dhensle commented Aug 6, 2025

I can confirm the timestamped outputs as described above for me. I have no issues with the code changes either. I think the only outstanding issue is the additional runtime question above.

@JoeJimFlood
Copy link
Contributor

JoeJimFlood commented Aug 7, 2025

I was finally able to get around to testing it on my end. It seemed to work just fine, though I did get the screenshotted error at the end. I'm not sure if that's related (and all of the outputs were written). I have attached one of the outputs in case anyone would like to review it to confirm that it looks correct.
ErrorAtEnd

@jpn--
Copy link
Member Author

jpn-- commented Aug 7, 2025

@JoeJimFlood

  1. I will look into that error message, and see if I can fix it.
  2. Runtime on this was 7078 seconds. How long does it take to run an otherwise identical run on the same server, but without the expression timing enabled?

@stefancoe
Copy link
Contributor

@jpn-- I can also do a runtime test. My tests so far have been with our small set of inputs, which runs in about 12 minutes, single process. I did do a single process run with a 10 percent sample using the full inputs. I can do this with expression profiling turned off for a comparison. We usually run 100 percent samples with MP, but I imagine that should be off for this test?

I tested the timestamped output folder and it works as expected.

@JoeJimFlood
Copy link
Contributor

@jpn-- I just reran the test on the same server using the same environment but setting expression_profile: False

  1. That error didn't happen this time. If I recall correctly I've seen it before within the past month or so. There's something about errors like this one and the one seen in Overflow error when coalescing memory logs #959 that's a bit concerning to me when using the model in application. The crashes stop the entire modeling system despite the fact that all of the outputs from ActivitySim are created.
  2. 6283 seconds

I also shared the results with @bhargavasana, and he agreed with me that it would be more useful if the process name were included in the folder name and not just the timestamp for when it was created. It took me a few tries to find the file I shared in my last comment.
image

@jpn--
Copy link
Member Author

jpn-- commented Aug 7, 2025

Oh, we should not have all those timestamped directories... that's a bug I will fix. There should only be a single timestamp for the entire multiprocessor run, stamped at the beginning of that run by the primary process. All the subprocesses should write into subdirectories of that one timestamped directory.

As for the error: it's clearly an error in the tool that merges the logs into a pair of html analysis outputs, the component and subcomponent reports. These are two output files in addition to all the other outputs you expect from an ActivitySim run, and I am 99% sure your first attempt did not produce these two final summary files. It didn't crash for the second run because it didn't run the expression profiling summary step where the error lives.

And: users should never have expression profiling enabled in "application" or "production" mode. This is a developer's tool, and your results show us that it costs about 13% more runtime overall. Only developers who will actually look at the performance results and try to tune the expressions to run faster should be using it.

@JoeJimFlood
Copy link
Contributor

Ah, I had misread the error message. About a month ago we had an error in application when the pipeline files were being written and I didn't look very closely and assumed it was the same thing.

@JoeJimFlood
Copy link
Contributor

@jpn-- Did the naming of the files work as intended?
image

@jpn--
Copy link
Member Author

jpn-- commented Aug 16, 2025

@JoeJimFlood Yes, your most recent message looks correct.
@dhensle Since you previously "request changes", I am blocked from merging this until you now approve it.

@dhensle
Copy link
Contributor

dhensle commented Aug 18, 2025

@jpn-- I think the updated text from Joe's full-scale sandag testing need to be added still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants