-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Reimplement DestinationPropagation according to live ranges. #145541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
8402e97
to
da2cabf
Compare
This comment was marked as outdated.
This comment was marked as outdated.
da2cabf
to
d0d5f20
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
[EXPERIMENT] Reimplement DestinationPropagation according to live ranges.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (9d4b3bf): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -2.8%, secondary -2.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.6%, secondary -7.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary -0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 470.373s -> 469.514s (-0.18%) |
139b66c
to
da90180
Compare
// We also ensure that operands read by terminators conflict with writes by that terminator. | ||
// For instance a function call may read args after having written to the destination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually stricter than necessary. It's only Move
operands that conflict with the destination place because they effectively "donate" the place to the callee. Copy
and Const
operands don't conflict because they are copied to a new place before the call.
I also don't think any other terminators have this issue, it is specific to Call
terminators (#71117).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 5 terminators that write to memory: Call, TailCall, Drop, Yield, InlineAsm. Yield terminators are forbidden at this stage.
However, InlineAsm has this issue too. And for both copies and moves.
About the Copy vs Move, I wonder what's the proper solution:
- match on terminators? -> I'd rather avoid if I can
- how to implement move-to-copy conversion later to allow more dest-prop
Could we push this question to a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the move behavior only applies to Call
and not other terminators. I've posted a proposal to clarify this in MIR.
}); | ||
} | ||
#[derive(Debug)] | ||
struct RelevantLocals { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic around tracking candidates could be vastly simplified by using a disjoint-set data structure, but this can be done later in a separate PR.
da90180
to
35399f0
Compare
The latest force-push removes the useless commit from the other PR to simplify the history. This analysis is strictly less powerful than the current algorithm. This PR forbids to merge locals that are live at the same point because of extra reads. I'm not sure it's important, but this makes DestinationPropagation and GVN adversarial in some ways. GVN attempts to make SSA locals with the longest live range possible. DestinationPropagation will prefer clean moves from a local to another. |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
cc @oli-obk as you were involved in reviewing the other PR. |
I'm struggling to picture an example that was accepted by the previous algorithm but not by this one. Could you give an example? |
For instance :
Where we may want to merge _0 and _1. |
compiler/rustc_index/src/interval.rs
Outdated
// Just extend the first range. | ||
*first_start = point; | ||
} else { | ||
self.map.insert(0, (point, point)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that this is O(n^2). It is possible to make this O(n) by building the intervals in reverse order and then reversing the whole array at the end. Actually even the final reversal isn't strictly needed if you take the reversed order into account when unioning rows and checking for disjointness.
35399f0
to
2b59dab
Compare
This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
☔ The latest upstream changes (presumably #145773) made this pull request unmergeable. Please resolve the merge conflicts. |
This PR reimplements DestinationPropagation as a problem of merging live-ranges of locals. We merge locals that have disjoint live-ranges. This allows merging several locals in the same round by updating live range information.
Live ranges are mainly computed using the
MaybeLiveLocals
analysis. The subtlety is that we split each statement and terminator in 2 positions. The first position is the regular statement. The second position is a shadow, which is always more live. It encodes partial writes and dead writes as a local being live for half a statement. This half statement ensures that writes conflict with another local's writes and regular liveness.r? @Amanieu