-
Notifications
You must be signed in to change notification settings - Fork 47
TBB update #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
TBB update #410
Conversation
Move found Callback
Make nodes with changed major allele set available to larch through the callback
Hi @ognian-, I have a test case for you: I'm getting a SEGV when trying to add one sample to the public SARS-CoV-2 tree, which has 8,379,881 samples/leaves. Here's how to reproduce the error (path to usher-sampled may need to be tweaked):
The -T is just to be polite to my coworkers on a shared server, I still get the SEGV with other reasonable values like 8 or 50. (With
Then it gets a SEGV with a stack that's over 100,000 calls deep. The stack starts like this:
Then the pattern of #6 - #17 repeats until almost the bottom of the very deep stack, which ends like this:
(112218 - 6) / 12 = 9351 recursions(?) deep which seems like more than a simple recursive traversal of the tree should require -- I would expect a few hundred maybe. But then I don't know how TBB works. Hopefully it will make more sense to you! |
@AngieHinrichs Thanks for providing the test case! I've pushed a fix for the infinite recursion. |
Thanks @ognian- , it's working now! I ran your branch's usher-sampled and matOptimize on the SARS-CoV-2 full tree files (17 million sequences). They both came up with comparable results to the current version. matOptimize has about the same runtime, but for some reason usher-sampled is about 5x slower (it takes about 70 minutes with the current version, but 383 - 386 minutes with the changes). It's still good enough to maintain a daily update of the SARS-CoV-2 tree, but it would be nice to see if that can be improved without too much effort. I will look into compiling in profiler support (although I don't know cmake very well) to see if I can get more details. (There's always "poor man's profiler" too 🙂) |
@AngieHinrichs Such slowdown is not acceptable, thanks for letting me know! I'll get back to you with an update when I gather some insight. |
At least part of the problem may be that I use IIRC I'm using such a small batch size because I used to occasionally hit apparent deadlock in usher-sampled, and Cheng suggested that I use a smaller batch size. That, and reducing the number of threads, seemed to help. I hardly ever see usher-sampled lock up for the SARS-CoV-2 build, but it happens (maybe one in 20 or 50 runs?) even with -T 16 for my nightly builds for viruses with a lot fewer available sequences. So the smaller batch size didn't seem to cause too much overhead with the current version of usher-sampled, but with the newer TBB, I guess the overhead is a lot more. If the changes make it so that usher-sampled can't deadlock any more, that would be great! Re: profiling -- I tried adding these 3 lines at various stages of CMakeLists.txt:
(as seen on stackoverflow -- but I kept getting errors from various stages of the compile about needing -fPIE instead of -fPIC, and it looked like the .c compiler commands were not using CMAKE_CXX_FLAGS anyway (a completely different set of options were used). I lost patience with cmake and tried "poor man's profiler", i.e. running in gdb, hitting ctrl-C and looking at the stack. Every time (25 out of 25 tries) the stack was this:
-- although maybe running something highly parallel in gdb and hitting ctrl-C would always hit the main thread anyway?? Dunno. Just thought I'd include that in case it rings a bell. |
@AngieHinrichs Thanks for the details! When you compare to the pre-TBB update build that runs in 70 minutes, is that from the latest main branch or some previous release tag? I've based the TBB branch off current main. |
latest main branch. But the speed of usher-sampled has been stable (on my server anyway) for a long time. |
I found a separate bug, that triggers an assert fail only in Debug builds:
This is the relevant stack trace:
|
Hi @ognian-, thanks for your ongoing work on this! Here is a smallish performance test:
Ony my server, that takes around 15s for the main branch usher-sampled and 17-19s with the changes. |
Reducing the |
And just to clarify, regarding the Debug-only assertion failure that you found: do we need to fix that or did you fix it? |
Regarding the bug in debug mode: now with the small tree I was able to progress on it, found a total of 4 asserts in main_mapper.cpp that fail, and if commented out the run finishes normally to success. I'll keep on investigating. |
I ran this branch's matOptimize on the pre-optimization full SARS-CoV-2 tree from 2025-08-25. It ran without errors in 107.5 minutes and produced a tree with almost identical I will try usher-sampled next. |
Sorry about the delay, I am too easily distracted. I tried the matsengrp branch usher-sampled on the full tree of 17 million sequences and it got good results (similar output about the same samples, similar |
This issue of usher-sampled relative slowness came up on July 28th, let me try running again with |
With |
With the latest changes, the performance with |
Great improvement with taskflow, now it takes only 91 minutes with |
This reverts commit db2839c.
This set of changes updates the TBB dependency version to latest stable (oneTBB 2022.2.0). It is also introducing the Move_Found_Callback from Larch.
Do not merge until all executables are properly tested!