-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Allow folding icmp eq (add X, C2), C when there is more than one-use when we can compute the range #144566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: AZero13 (AZero13) ChangesThis is because this actually is just: // ((X + Y) u< X) ? -1 : (X + Y) --> uadd.sat(X, Y) Full diff: https://github.com/llvm/llvm-project/pull/144566.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..d733a37efd68e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -1013,12 +1013,19 @@ static Value *canonicalizeSaturatedAdd(ICmpInst *Cmp, Value *TVal, Value *FVal,
// uge -1 is canonicalized to eq -1 and requires special handling
// (a == -1) ? -1 : a + 1 -> uadd.sat(a, 1)
+ // ult 1 is canonicalized to eq 0
+ // (a + 1 == 0) ? -1 : a + 1 -> uadd.sat(a, 1)
if (Pred == ICmpInst::ICMP_EQ) {
if (match(FVal, m_Add(m_Specific(Cmp0), m_One())) &&
- match(Cmp1, m_AllOnes())) {
+ match(Cmp1, m_AllOnes()))
return Builder.CreateBinaryIntrinsic(
Intrinsic::uadd_sat, Cmp0, ConstantInt::get(Cmp0->getType(), 1));
- }
+
+ if (match(Cmp1, m_Zero()) && match(Cmp0, m_Add(m_Value(X), m_One())) &&
+ match(FVal, m_Add(m_Specific(X), m_One())))
+ return Builder.CreateBinaryIntrinsic(Intrinsic::uadd_sat, X,
+ ConstantInt::get(X->getType(), 1));
+
return nullptr;
}
diff --git a/llvm/test/Transforms/InstCombine/saturating-add-sub.ll b/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
index cfd679c0cc592..392551defb7ef 100644
--- a/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
+++ b/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
@@ -2350,4 +2350,15 @@ define i8 @fold_add_umax_to_usub_multiuse(i8 %a) {
ret i8 %sel
}
+define i32 @add_check_zero(i32 %num) {
+; CHECK-LABEL: @add_check_zero(
+; CHECK-NEXT: [[COND:%.*]] = call i32 @llvm.uadd.sat.i32(i32 [[ADD:%.*]], i32 1)
+; CHECK-NEXT: ret i32 [[COND]]
+;
+ %add = add i32 %num, 1
+ %cmp = icmp eq i32 %add, 0
+ %cond = select i1 %cmp, i32 -1, i32 %add
+ ret i32 %cond
+}
+
declare void @usei8(i8)
|
c00d050
to
978106e
Compare
I don't think this is the right place to fix this. We need to rewrite
to
This is similar to
which is handled by
|
folds fine, but when we have multiple uses, it is not simplifying because it is reusing the a + 7 |
I don't understand why we're willing to fold an |
Beats me, this is just my observation. |
There's a FIXME for it in
|
Oh I did a different thing: // (X + C1) == C) --> X == C - C1 |
|
@topperc yeah this is going to require a lot of fixes. Can I just go back to what I did at the start because this change is otherwise gonna affect 50 test files, many of which regressed and should be addressed in another PR. |
978106e
to
8a12f91
Compare
This patch gets what you want with only 1 test update. So it seems like only handling the
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…when we can compute the range If there are multiple uses of an add, we can fold a comparison anyway if we can compute a constant range for it, which should happen in cases such as saturated add.
@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR |
Ok. Is there a known workload that benefits from this? |
Actually @topperc We can merge this, but the code you modified does not seem obvious in what it is doing with the equality. But I will address this once I do multiple PRs that address the root cause because this cannot be done in one PR. |
X86/RISCV/ARM/AArch64 all generates the same code for these two IR functions. Do I need a larger test case to show improved code generation?
|
It's more about the IR optimization, not just codegen. |
But in all honesty, I think your solution of what we have now is good for now. I will work on a proper solution. but the side effects of your solution are not that bad. at worst they may result in AArch64 having an extra cmp because it won't emit branch-if-zero instructions sometimes but I will just address those separately since I feel like the codegen lowering should address this anyway. |
If there are multiple uses of an add, we can fold a comparison anyway if we can compute a constant range for it, which should happen in cases such as saturated add.
Alive2: https://alive2.llvm.org/ce/z/Y4eHav