-
Notifications
You must be signed in to change notification settings - Fork 77
Fix mutation ordering #3257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mutation ordering #3257
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3257 +/- ##
==========================================
+ Coverage 89.73% 89.75% +0.01%
==========================================
Files 29 29
Lines 32544 32499 -45
Branches 5958 5949 -9
==========================================
- Hits 29203 29169 -34
+ Misses 1894 1888 -6
+ Partials 1447 1442 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
What's the difference now between sort_mutations and sort_canonical? Would a simpler way of thinking about this be to say that we're changing things by making |
The only difference now is that canonical has a sort by node id, which I think is still needed in the case of nodes with equal times? |
Let's just use sort_canonical for now, and see how we get on with test suite breakage? I don't see much point in having another stricter sort if we're already changing the semantics. |
Sorting is now identical to canonical - only a couple of python tests failing. Will fix those and add some new tests for the new sort order. |
I've fixed up the tests - the only one I wasn't sure about was |
Looks great - I think we need @petrelharp's input on the haplotype extender stuff |
Whoops - I missed two topology tests. These got a bit more involved - I've had to switch from checking for exact mutational equality to mutational functional equivalence in the simplify tests. |
b5df665
to
189e790
Compare
I think this is ready for final review. I've added @hyanwong's initial issue failing code as a test. |
Hm - we should benchmark the change? - recall that Re: the extend haplotypes - thanks for digging into that! For reference, here's the error:
The error is fine, though - it's because the tables end up in different sorted order, since the node below a mutation can change after extending haplotypes, and hence the mutation's position in the table after the new sorting. |
On already sorted tables (French-Canadian dataset) this branch adds about 40% to the sort time. I can do some more digging with shuffled tables if you think it is worth it. |
I should really just try it with SLiM. I can just copy over the code in this branch to try it, right? |
Yep, no changes to external interfaces. Just copying |
I'm getting no measurable* difference with this change in SLiM's behavior with some fairly mutation-sort-intensive sims. So I think we're good. *(with admittedly not a terribly sensitive measurement) |
3104619
to
0bb646d
Compare
0bb646d
to
84a2f4f
Compare
Rebased and squashed - I think we just need your green tick here @jeromekelleher |
Addressing #3253
This PR changes the mutation sort order from
site->time->ID
tosite->time->node_time->num_descendants->ID
This is required as if mutations have unknown times, and no parent info, we need to use the node time to have parents before children. For mutations at the same node, we assume the existing order is parents before children. We need to use num descendants to use parent info if it is present for mutations at the same node.
I've updated a few of the breaking Python tests to give a flavour of the impact. This PR still needs tests for the new order, and the rest of the broken tests fixing, but thought this was a good point for feedback.