TBB update #410

ognian- · 2025-07-21T15:42:08Z

This set of changes updates the TBB dependency version to latest stable (oneTBB 2022.2.0). It is also introducing the Move_Found_Callback from Larch.

Do not merge until all executables are properly tested!

…lback

Move found Callback

Make nodes with changed major allele set available to larch through the callback

AngieHinrichs · 2025-07-23T19:46:20Z

Hi @ognian-, I have a test case for you: I'm getting a SEGV when trying to add one sample to the public SARS-CoV-2 tree, which has 8,379,881 samples/leaves. Here's how to reproduce the error (path to usher-sampled may need to be tweaked):

curl -O https://hgwdev.gi.ucsc.edu/~angie/PP765594.1.mpl
curl -O https://hgwdev.gi.ucsc.edu/~angie/NC_045512.2.fa
curl -O https://hgdownload.gi.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/2025/07/22/public-2025-07-22.all.masked.pb.gz

gdb --args ~/github/matsengrp/usher/build/usher-sampled \
    -T 16 \
    -i public-2025-07-22.all.masked.pb.gz \
    --diff PP765594.1.mpl \
    --ref NC_045512.2.fa \
    -o test.pb.gz

The -T is just to be polite to my coworkers on a shared server, I still get the SEGV with other reasonable values like 8 or 50. (With -T 2 or 5 which are not really reasonable, there is a different error in tbb/arena_slot.cpp). Anyway, when run, usher-sampled seems to initialize as usual, and its stderr output gets this far:

Sample start idx 10168852, end index 10168853
main dfs size 5858841
===================Node ID so far 10168853 ===================output size 29904, nuc size 29904, pos_mutated size 29904
Found 1 missing samples.

Adding missing samples to the tree.
sorting done

Then it gets a SEGV with a stack that's over 100,000 calls deep. The stack starts like this:

#0  0x00005555555c5fc3 in std::__new_allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > >::allocate (
    this=0x7fffe510f020, __n=1) at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/new_allocator.h:126
#1  std::allocator_traits<std::allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > > >::allocate (
    __n=1, __a=...) at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/alloc_traits.h:482
#2  std::_Vector_base<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > >, std::allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > > >::_M_allocate (__n=1, this=0x7fffe510f020)
    at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/stl_vector.h:381
#3  std::_Vector_base<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > >, std::allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > > >::_M_create_storage (__n=1, this=0x7fffe510f020)
    at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/stl_vector.h:398
#4  std::_Vector_base<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > >, std::allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > > >::_Vector_base (__a=..., __n=1, this=0x7fffe510f020)
    at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/stl_vector.h:335
#5  std::vector<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > >, std::allocator<std::unordered_map<int, unsigned char, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, unsigned char> > > > >::vector (__a=..., __n=1, this=0x7fffe510f020)
    at /hive/users/angie/miniconda3/envs/usher/lib/gcc/x86_64-conda-linux-gnu/13.3.0/include/c++/bits/stl_vector.h:557
#6  assign_descendant_possible_muts_recursive (root=0x5556dc450980, output=std::unordered_map with 0 elements, tg=...)
    at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/sampler.cpp:49
#7  0x00005555555c65b4 in operator() (__closure=0x7fff9b839860) at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/sampler.cpp:61
#8  tbb::detail::d2::(anonymous namespace)::task_ptr_or_nullptr<const assign_descendant_possible_muts_recursive(Mutation_Annotated_Tree::Node*, std::unordered_map<int, unsigned char>&, tbb::detail::d2::task_group&)::<lambda()>&> (f=...) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:131
#9  tbb::detail::d2::function_task<assign_descendant_possible_muts_recursive(Mutation_Annotated_Tree::Node*, std::unordered_map<int, unsigned char>&, tbb::detail::d2::task_group&)::<lambda()> >::execute(tbb::detail::d1::execution_data &) (this=0x7fff9b839800, ed=...)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:89
#10 0x00007ffff7d03b0b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7fff9b839800, this=<optimized out>)
    at build/_deps/onetbb-src/src/tbb/task_dispatcher.h:334
#11 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=<optimized out>) at build/_deps/onetbb-src/src/tbb/task_dispatcher.h:470
#12 tbb::detail::r1::task_dispatcher::execute_and_wait (t=t@entry=0x0, wait_ctx=..., w_ctx=...) at build/_deps/onetbb-src/src/tbb/task_dispatcher.cpp:168
#13 0x00007ffff7d04151 in tbb::detail::r1::wait (wait_ctx=..., w_ctx=...) at build/_deps/onetbb-src/src/tbb/task_dispatcher.cpp:126
#14 0x00005555555c6371 in tbb::detail::d1::wait (ctx=..., wait_ctx=...)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/detail/_task.h:267
#15 tbb::detail::d2::task_group_base::wait()::{lambda()#1}::operator()() const (__closure=<optimized out>)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:554
#16 tbb::detail::d0::try_call_proxy<tbb::detail::d2::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d2::task_group_base::wait()::{lambda()#2}>(tbb::detail::d2::task_group_base::wait()::{lambda()#2}) (on_completion_body=..., this=<optimized out>)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/detail/_template_helpers.h:234
#17 tbb::detail::d2::task_group_base::wait (this=0x7fffffffbaf0) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:555

Then the pattern of #6 - #17 repeats until almost the bottom of the very deep stack, which ends like this:

#112218 assign_descendant_possible_muts_recursive (root=0x55565e3f3720, output=std::unordered_map with 0 elements, tg=...)
    at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/sampler.cpp:66
#112219 0x00005555555c65b4 in operator() (__closure=0x7fffa0490760) at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/sampler.cpp:61
#112220 tbb::detail::d2::(anonymous namespace)::task_ptr_or_nullptr<const assign_descendant_possible_muts_recursive(Mutation_Annotated_Tree::Node*, std::unordered_map<int, unsigned char>&, tbb::detail::d2::task_group&)::<lambda()>&> (f=...) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:131
#112221 tbb::detail::d2::function_task<assign_descendant_possible_muts_recursive(Mutation_Annotated_Tree::Node*, std::unordered_map<int, unsigned char>&, tbb::detail::d2::task_group&)::<lambda()> >::execute(tbb::detail::d1::execution_data &) (this=0x7fffa0490700, ed=...)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/task_group.h:89
#112222 0x00007ffff7d069aa in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=<optimized out>, waiter=..., this=0x7fffec6b8980)
    at build/_deps/onetbb-src/src/tbb/task_dispatcher.h:334
#112223 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fffec6b8980)
    at build/_deps/onetbb-src/src/tbb/task_dispatcher.h:470
#112224 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at build/_deps/onetbb-src/src/tbb/arena.cpp:215
#112225 tbb::detail::r1::thread_dispatcher_client::process (td=..., this=<optimized out>) at build/_deps/onetbb-src/src/tbb/thread_dispatcher_client.h:41
#112226 tbb::detail::r1::thread_dispatcher::process (this=<optimized out>, j=...) at build/_deps/onetbb-src/src/tbb/thread_dispatcher.cpp:195
#112227 0x00007ffff7cfe5dc in tbb::detail::r1::rml::private_worker::run (this=0x7fffec6a3f80) at build/_deps/onetbb-src/src/tbb/private_server.cpp:271
#112228 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fffec6a3f80) at build/_deps/onetbb-src/src/tbb/private_server.cpp:221
#112229 0x00007ffff7089d22 in start_thread (arg=<optimized out>) at pthread_create.c:443
#112230 0x00007ffff710ed40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

(112218 - 6) / 12 = 9351 recursions(?) deep which seems like more than a simple recursive traversal of the tree should require -- I would expect a few hundred maybe. But then I don't know how TBB works. Hopefully it will make more sense to you!

ognian- · 2025-07-25T06:29:27Z

@AngieHinrichs Thanks for providing the test case! I've pushed a fix for the infinite recursion.

AngieHinrichs · 2025-07-28T16:42:06Z

Thanks @ognian- , it's working now! I ran your branch's usher-sampled and matOptimize on the SARS-CoV-2 full tree files (17 million sequences). They both came up with comparable results to the current version. matOptimize has about the same runtime, but for some reason usher-sampled is about 5x slower (it takes about 70 minutes with the current version, but 383 - 386 minutes with the changes). It's still good enough to maintain a daily update of the SARS-CoV-2 tree, but it would be nice to see if that can be improved without too much effort. I will look into compiling in profiler support (although I don't know cmake very well) to see if I can get more details. (There's always "poor man's profiler" too 🙂)

ognian- · 2025-07-28T17:09:48Z

@AngieHinrichs Such slowdown is not acceptable, thanks for letting me know! I'll get back to you with an update when I gather some insight.

AngieHinrichs · 2025-07-28T20:46:47Z

At least part of the problem may be that I use --batch_size_per_process 10. I just tried running with 100 instead of 10, and it took only 95 minutes.

IIRC I'm using such a small batch size because I used to occasionally hit apparent deadlock in usher-sampled, and Cheng suggested that I use a smaller batch size. That, and reducing the number of threads, seemed to help. I hardly ever see usher-sampled lock up for the SARS-CoV-2 build, but it happens (maybe one in 20 or 50 runs?) even with -T 16 for my nightly builds for viruses with a lot fewer available sequences. So the smaller batch size didn't seem to cause too much overhead with the current version of usher-sampled, but with the newer TBB, I guess the overhead is a lot more.

If the changes make it so that usher-sampled can't deadlock any more, that would be great!

Re: profiling -- I tried adding these 3 lines at various stages of CMakeLists.txt:

SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pg")
SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pg")
SET(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -pg")

(as seen on stackoverflow -- but I kept getting errors from various stages of the compile about needing -fPIE instead of -fPIC, and it looked like the .c compiler commands were not using CMAKE_CXX_FLAGS anyway (a completely different set of options were used). I lost patience with cmake and tried "poor man's profiler", i.e. running in gdb, hitting ctrl-C and looking at the stack. Every time (25 out of 25 tries) the stack was this:

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007ffff7cef1bb in tbb::detail::r1::futex_wait (comparand=2, futex=0x7fffffffaff8) at build/_deps/onetbb-src/src/tbb/semaphore.h:100
#2  tbb::detail::r1::binary_semaphore::P (this=0x7fffffffaff8) at build/_deps/onetbb-src/src/tbb/semaphore.h:261
#3  tbb::detail::r1::sleep_node<unsigned long>::wait (this=0x7fffffffafd0) at build/_deps/onetbb-src/src/tbb/concurrent_monitor.h:170
#4  tbb::detail::r1::concurrent_monitor_base<unsigned long>::commit_wait (this=<optimized out>, node=...) at build/_deps/onetbb-src/src/tbb/concurrent_monitor.h:232
#5  tbb::detail::r1::concurrent_monitor_base<unsigned long>::commit_wait (node=..., this=0x7fffd5dc3c28) at build/_deps/onetbb-src/src/tbb/concurrent_monitor.h:228
#6  tbb::detail::r1::concurrent_monitor_base<unsigned long>::wait<tbb::detail::r1::sleep_node<long unsigned int>, tbb::detail::r1::wait_bounded_queue_monitor(concurrent_monitor*, std::size_t, std::ptrdiff_t, tbb::detail::d1::delegate_base&)::<lambda()> > (node=..., pred=..., 
    this=0x7fffd5dc3c28) at build/_deps/onetbb-src/src/tbb/concurrent_monitor.h:262
#7  tbb::detail::r1::wait_bounded_queue_monitor (monitors=<optimized out>, monitor_tag=monitor_tag@entry=1, target=target@entry=22, predicate=warning: RTTI symbol not found for class 'tbb::detail::d1::delegated_function<tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#1}>'
...) at build/_deps/onetbb-src/src/tbb/concurrent_bounded_queue.cpp:57
#8  0x00005555555b815a in tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_wait<tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#1}>(tbb::detail::r1::concurrent_monitor*, unsigned long, long, tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#1}) (pred=..., target=22, monitor_tag=1, monitors=<optimized out>, this=0x7fffffffb2a0)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/concurrent_queue.h:329
#9  tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#2}::operator()() const (__closure=<optimized out>) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/concurrent_queue.h:621
#10 tbb::detail::d0::try_call_proxy<tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#2}>::on_exception<tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#3}>(tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop(void*)::{lambda()#3}) (on_exception_body=..., this=<optimized out>)
    at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/detail/_template_helpers.h:227
#11 tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::internal_pop (dst=<synthetic pointer>, this=0x7fffffffb2a0) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/concurrent_queue.h:622
#12 tbb::detail::d2::concurrent_bounded_queue<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*, tbb::detail::d1::cache_aligned_allocator<std::tuple<std::vector<Main_Tree_Target, std::allocator<Main_Tree_Target> >, Sample_Muts*, bool>*> >::pop (result=<synthetic pointer>: <optimized out>, this=0x7fffffffb2a0) at /cluster/home/angie/github/matsengrp/usher/build/_deps/onetbb-src/src/tbb/../../include/tbb/../oneapi/tbb/concurrent_queue.h:499
#13 place_sample_thread (sample_start_idx=<optimized out>, do_print=<optimized out>, printer_node=..., multi_processing=<optimized out>, max_uncertainty=<optimized out>, max_parsimony=<optimized out>, dry_run=false, parsimony_increase_threshold=<optimized out>, curr_idx=..., 
    stop=..., all_size=<optimized out>, retry_queue=..., found_place_queue=..., send_queue=..., deleted_nodes=std::vector of length 5, capacity 47665 = {...}, main_tree=..., start_idx=<optimized out>)
    at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/place_sample.cpp:460
#14 place_sample_leader (sample_to_place=std::vector of length 47665, capacity 65536 = {...}, main_tree=..., batch_size=batch_size@entry=10, curr_idx=..., parsimony_increase_threshold=<optimized out>, dry_run=dry_run@entry=false, placement_stats_file=<optimized out>, 
    max_parsimony=<optimized out>, max_uncertainty=<optimized out>, low_confidence_samples=..., samples_clade=..., sample_start_idx=<optimized out>, idx_map=<optimized out>, do_print=<optimized out>)
    at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/place_sample.cpp:793
#15 0x0000555555582fc1 in leader_thread (options=..., batch_size_per_process=<optimized out>) at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/driver/main.cpp:350
#16 main (argc=<optimized out>, argv=<optimized out>) at /cluster/home/angie/github/matsengrp/usher/src/usher-sampled/driver/main.cpp:512

-- although maybe running something highly parallel in gdb and hitting ctrl-C would always hit the main thread anyway?? Dunno. Just thought I'd include that in case it rings a bell.

ognian- · 2025-07-29T06:07:50Z

@AngieHinrichs Thanks for the details! When you compare to the pre-TBB update build that runs in 70 minutes, is that from the latest main branch or some previous release tag? I've based the TBB branch off current main.

AngieHinrichs · 2025-07-29T16:53:46Z

When you compare to the pre-TBB update build that runs in 70 minutes, is that from the latest main branch or some previous release tag?

latest main branch. But the speed of usher-sampled has been stable (on my server anyway) for a long time.

ognian- · 2025-07-31T08:17:02Z

I found a separate bug, that triggers an assert fail only in Debug builds:

Assertion shared_mutations.back().get_descendant_mut()&shared_[16] mutations.back().get_mut_one_hot()' failed.`

This is the relevant stack trace:

Down_Sibling_Hook::both at src/usher-sampled/main_mapper.cpp:206
Combine_Hook<Down_Decendant_Hook, Down_Sibling_Hook>::both at src/usher-sampled/main_mapper.cpp:37
generic_merge<Combine_Hook<Down_Decendant_Hook, Down_Sibling_Hook>>  at src/usher-sampled/main_mapper.cpp:282
search_serial at src/usher-sampled/main_mapper.cpp:346
search_serial at src/usher-sampled/main_mapper.cpp:362
main_tree_search_recursive at src/usher-sampled/main_mapper.cpp:389
main_tree_search_recursive at src/usher-sampled/main_mapper.cpp:422
place_main_tree at src/usher-sampled/main_mapper.cpp:468
find_place at src/usher-sampled/place_sample_shared.cpp:178
Finder::operator() at src/usher-sampled/place_sample.cpp:685

AngieHinrichs · 2025-07-31T16:34:26Z

Hi @ognian-, thanks for your ongoing work on this! Here is a smallish performance test:

curl -O https://raw.githubusercontent.com/cov-lineages/pangolin-data/refs/heads/main/pangolin_data/data/lineageTree.pb
curl -O https://hgwdev.gi.ucsc.edu/~angie/test.1000.mpl
# Same NC_045512.2.fa as before

usher-sampled -T 8 \
    -i lineageTree.pb \
    --diff test.1000.mpl \
    --ref NC_045512.2.fa \
    --optimization_radius 0 \
    --batch_size_per_process 10 \
    -o test.pb.gz \
    >& test.log

Ony my server, that takes around 15s for the main branch usher-sampled and 17-19s with the changes.

AngieHinrichs · 2025-07-31T16:40:25Z

Reducing the --batch_size_per_process to 5 increases the difference between the times: ~17s vs. 26-29s.

AngieHinrichs · 2025-07-31T16:45:38Z

And just to clarify, regarding the Debug-only assertion failure that you found: do we need to fix that or did you fix it?

ognian- · 2025-07-31T17:11:26Z

Regarding the bug in debug mode: now with the small tree I was able to progress on it, found a total of 4 asserts in main_mapper.cpp that fail, and if commented out the run finishes normally to success. I'll keep on investigating.

AngieHinrichs · 2025-08-26T19:09:45Z

I ran this branch's matOptimize on the pre-optimization full SARS-CoV-2 tree from 2025-08-25. It ran without errors in 107.5 minutes and produced a tree with almost identical matUtils summary stats to the daily build (main branch, ~102 minutes) matOptimize run. Awesome work @ognian- ! 🎉

I will try usher-sampled next.

AngieHinrichs · 2025-09-02T21:45:48Z

Sorry about the delay, I am too easily distracted. I tried the matsengrp branch usher-sampled on the full tree of 17 million sequences and it got good results (similar output about the same samples, similar matUtils summary -i output), but it took 396 minutes while the main branch usher took 67 minutes. It is slower, but the daily build will still work.

AngieHinrichs · 2025-09-02T23:06:54Z

This issue of usher-sampled relative slowness came up on July 28th, let me try running again with --batch_size_per_process 100 instead of 10, stay tuned.

AngieHinrichs · 2025-09-03T15:42:44Z

With --batch_size_per_process 100 instead of 10, usher-sampled took 95 minutes, much better.

AngieHinrichs · 2025-09-25T21:41:20Z

With the latest changes, the performance with --batch-size-per-process 100 has stayed the same at 95 minutes (92 minutes in a second run), while the performance with --batch-size-per-process 10 has improved a lot from 396 minutes to 164 minutes.

AngieHinrichs · 2025-10-15T16:52:06Z

Great improvement with taskflow, now it takes only 91 minutes with --batch-size-per-process 10! (and the stdout log of sample placements and the matUtils summary results on the resulting tree look good)

This reverts commit db2839c.

yceh and others added 30 commits June 28, 2022 14:55

callback example

fa48ebd

Merge branch 'larch-dev' of https://github.com/yatisht/usher into cal…

9e78f6d

…lback

make callback do the filtering

e9665dc

Merge pull request yatisht#253 from yceh/callback

5e65616

Move found Callback

give the callback list of change in major allele set (Fitch set)

f4104fe

filter away empty changes

be0b8c0

Merge pull request yatisht#255 from yceh/callback

de012d3

Make nodes with changed major allele set available to larch through the callback

Merge remote-tracking branch 'origin/larch-dev' into add-usher-callback

1eb0933

Added callback parameter

9e40e6a

Added missing callback params

876c54a

Passing move by reference

9d29766

Fixed warnings in ASAN,UBSAN

ae610f8

fix destination node side split node, by Cheng Ye

c376675

Commented unnecessary assert

4cfb3f6

Fixed capture bug

e892138

added routine to update MAT node_idx attribute correctly

8597b82

Fixed uninitialized objects

e1b0e10

Added get_node_idx()

59c69f4

Merge remote-tracking branch 'origin/master' into usher_sync

d5f44c8

Moved callback to last parameter; default provided

8e09cec

Applied TBB update

a74eedf

Merge remote-tracking branch 'origin/tbb_update' into tbb-update

105a28f

Initial TBB migration

b767492

Migrated matOptimize

3cb0174

Migrated matUtils

f90359b

Migrated usher-sampled

28d708b

Migrated usher

f3487be

Migrated ripples and ripples-fast

e9dc2d3

Migrated compareVCF

30694ed

Migrated check_samples_place

1491906

Fixed sampler recursive tasks

0a6963b

Removed spin locks; Protobuf linking fix; Fixed unitialized variable

a5b4c58

ognian- added 2 commits August 19, 2025 22:17

Fix for boost 1.88

4675834

Fix for buffer overflow

135947c

AngieHinrichs mentioned this pull request Sep 12, 2025

Bump protobuf to <5 in usher recipe bioconda/bioconda-recipes#58058

Merged

ognian- added 2 commits September 18, 2025 12:17

Merge remote-tracking branch 'origin/master' into tbb-update

b5f704c

Reducing allocations for task_group

39f0030

yatisht marked this pull request as ready for review October 7, 2025 19:54

ognian- added 2 commits October 14, 2025 04:01

Adding taskflow

5c32e0a

Adding taskflow: place_sample_follower

782937c

ognian- added 2 commits October 17, 2025 19:32

Trial for main_mapper

db2839c

Revert "Trial for main_mapper"

f490cb1

This reverts commit db2839c.

TBB update #410

Are you sure you want to change the base?

TBB update #410

Uh oh!

Conversation

ognian- commented Jul 21, 2025

Uh oh!

AngieHinrichs commented Jul 23, 2025

Uh oh!

ognian- commented Jul 25, 2025

Uh oh!

AngieHinrichs commented Jul 28, 2025

Uh oh!

ognian- commented Jul 28, 2025

Uh oh!

AngieHinrichs commented Jul 28, 2025

Uh oh!

ognian- commented Jul 29, 2025

Uh oh!

AngieHinrichs commented Jul 29, 2025

Uh oh!

ognian- commented Jul 31, 2025

Uh oh!

AngieHinrichs commented Jul 31, 2025

Uh oh!

AngieHinrichs commented Jul 31, 2025

Uh oh!

AngieHinrichs commented Jul 31, 2025

Uh oh!

ognian- commented Jul 31, 2025

Uh oh!

AngieHinrichs commented Aug 26, 2025

Uh oh!

AngieHinrichs commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngieHinrichs commented Sep 2, 2025

Uh oh!

AngieHinrichs commented Sep 3, 2025

Uh oh!

AngieHinrichs commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngieHinrichs commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AngieHinrichs commented Sep 2, 2025 •

edited

Loading

AngieHinrichs commented Sep 25, 2025 •

edited

Loading