Allow folding icmp eq (add X, C2), C when there is more than one-use when we can compute the range #144566

AZero13 · 2025-06-17T17:04:04Z

If there are multiple uses of an add, we can fold a comparison anyway if we can compute a constant range for it, which should happen in cases such as saturated add.

Alive2: https://alive2.llvm.org/ce/z/Y4eHav

llvmbot · 2025-06-17T17:04:37Z

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: AZero13 (AZero13)

Changes

This is because this actually is just:

// ((X + Y) u< X) ? -1 : (X + Y) --> uadd.sat(X, Y)
// ((X + Y) u< Y) ? -1 : (X + Y) --> uadd.sat(X, Y)

Full diff: https://github.com/llvm/llvm-project/pull/144566.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (+9-2)
(modified) llvm/test/Transforms/InstCombine/saturating-add-sub.ll (+11)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
index 73ba0f78e8053..d733a37efd68e 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
@@ -1013,12 +1013,19 @@ static Value *canonicalizeSaturatedAdd(ICmpInst *Cmp, Value *TVal, Value *FVal,
 
   // uge -1 is canonicalized to eq -1 and requires special handling
   // (a == -1) ? -1 : a + 1 -> uadd.sat(a, 1)
+  // ult 1 is canonicalized to eq 0
+  // (a + 1 == 0) ? -1 : a + 1 -> uadd.sat(a, 1)
   if (Pred == ICmpInst::ICMP_EQ) {
     if (match(FVal, m_Add(m_Specific(Cmp0), m_One())) &&
-        match(Cmp1, m_AllOnes())) {
+        match(Cmp1, m_AllOnes()))
       return Builder.CreateBinaryIntrinsic(
           Intrinsic::uadd_sat, Cmp0, ConstantInt::get(Cmp0->getType(), 1));
-    }
+
+    if (match(Cmp1, m_Zero()) && match(Cmp0, m_Add(m_Value(X), m_One())) &&
+        match(FVal, m_Add(m_Specific(X), m_One())))
+      return Builder.CreateBinaryIntrinsic(Intrinsic::uadd_sat, X,
+                                           ConstantInt::get(X->getType(), 1));
+
     return nullptr;
   }
 
diff --git a/llvm/test/Transforms/InstCombine/saturating-add-sub.ll b/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
index cfd679c0cc592..392551defb7ef 100644
--- a/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
+++ b/llvm/test/Transforms/InstCombine/saturating-add-sub.ll
@@ -2350,4 +2350,15 @@ define i8 @fold_add_umax_to_usub_multiuse(i8 %a) {
   ret i8 %sel
 }
 
+define i32 @add_check_zero(i32 %num) {
+; CHECK-LABEL: @add_check_zero(
+; CHECK-NEXT:    [[COND:%.*]] = call i32 @llvm.uadd.sat.i32(i32 [[ADD:%.*]], i32 1)
+; CHECK-NEXT:    ret i32 [[COND]]
+;
+  %add = add i32 %num, 1
+  %cmp = icmp eq i32 %add, 0
+  %cond = select i1 %cmp, i32 -1, i32 %add
+  ret i32 %cond
+}
+
 declare void @usei8(i8)

topperc · 2025-06-17T19:00:39Z

I don't think this is the right place to fix this. We need to rewrite

  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0

to

  %cmp = icmp eq i32 %num, -1

This is similar to

  %add = add i32 %num, 2
  %cmp = icmp ult i32 %add, 2
-->
  %cmp = icmp uge %num,2

which is handled by InstCombinerImpl::foldICmpAddConstant using this code, but we already excluded equality comparisons by the time we get there.

  auto CR = ConstantRange::makeExactICmpRegion(Pred, C).subtract(*C2);           
  const APInt &Upper = CR.getUpper();                                            
  const APInt &Lower = CR.getLower();                                            
  if (Cmp.isSigned()) {                                                          
    if (Lower.isSignMask())                                                      
      return new ICmpInst(ICmpInst::ICMP_SLT, X, ConstantInt::get(Ty, Upper));   
    if (Upper.isSignMask())                                                      
      return new ICmpInst(ICmpInst::ICMP_SGE, X, ConstantInt::get(Ty, Lower));   
  } else {                                                                       
    if (Lower.isMinValue())                                                      
      return new ICmpInst(ICmpInst::ICMP_ULT, X, ConstantInt::get(Ty, Upper));   
    if (Upper.isMinValue())                                                      
      return new ICmpInst(ICmpInst::ICMP_UGE, X, ConstantInt::get(Ty, Lower));   
  }

AZero13 · 2025-06-17T19:24:11Z

I don't think this is the right place to fix this. We need to rewrite
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
to
  %cmp = icmp eq i32 %num, -1
Then the rest just works.
unsigned square2eer(unsigned a) {
return a + 7 == 7;
}

folds fine, but when we have multiple uses, it is not simplifying because it is reusing the a + 7

topperc · 2025-06-17T19:27:56Z

I don't think this is the right place to fix this. We need to rewrite
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
to
  %cmp = icmp eq i32 %num, -1
Then the rest just works.
unsigned square2eer(unsigned a) {
return a + 7 == 7;
}
folds fine, but when we have multiple uses, it is not simplifying because it is reusing the a + 7

I don't understand why we're willing to fold an ugt compare when the add has multiple uses, but not an eq compare.

AZero13 · 2025-06-17T19:29:08Z

I don't think this is the right place to fix this. We need to rewrite
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
to
  %cmp = icmp eq i32 %num, -1
Then the rest just works.
unsigned square2eer(unsigned a) {
return a + 7 == 7;
}
folds fine, but when we have multiple uses, it is not simplifying because it is reusing the a + 7
I don't understand why we're willing to fold an ugt compare when the add has multiple uses, but not an eq compare.

Beats me, this is just my observation.

topperc · 2025-06-17T19:32:01Z

I don't think this is the right place to fix this. We need to rewrite
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
to
  %cmp = icmp eq i32 %num, -1
Then the rest just works.
unsigned square2eer(unsigned a) {
return a + 7 == 7;
}
folds fine, but when we have multiple uses, it is not simplifying because it is reusing the a + 7
I don't understand why we're willing to fold an ugt compare when the add has multiple uses, but not an eq compare.
Beats me, this is just my observation.

There's a FIXME for it in InstCombinerImpl::foldICmpBinOpEqualityWithConstant

  case Instruction::Add: {                                                       
    // (A + C2) == C --> A == (C - C2)                                           
    // (A + C2) != C --> A != (C - C2)                                           
    // TODO: Remove the one-use limitation? See discussion in D58633

AZero13 · 2025-06-17T19:43:56Z

InstCombinerImpl::foldICmpBinOpEqualityWithConstant

Oh I did a different thing:

// (X + C1) == C) --> X == C - C1
// (X + C1) != C) --> X != C - C1
Constant *C1;
if (Cmp.isEquality() && match(Y, m_ImmConstant(C1))) {
return new ICmpInst(Pred, Y,
ConstantExpr::getSub(ConstantInt::get(Ty, C), C1));
}

AZero13 · 2025-06-17T19:44:53Z

This was my plan

AZero13 · 2025-06-17T19:49:47Z

// Match special-case for increment-by-1.
if (Pred == ICmpInst::ICMP_EQ) {
  // (a + 1) == 0
  // (1 + a) == 0
  if (AddExpr.match(ICmpLHS) && m_ZeroInt().match(ICmpRHS) &&
      (m_One().match(AddLHS) || m_One().match(AddRHS)))
    return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpLHS);
  // 0 == (a + 1)
  // 0 == (1 + a)
  if (m_ZeroInt().match(ICmpLHS) && AddExpr.match(ICmpRHS) &&
      (m_One().match(AddLHS) || m_One().match(AddRHS)))
    return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpRHS);
}

So does this mean I can remove this from UAddWithOverflow_match?

AZero13 · 2025-06-17T20:49:39Z

@topperc yeah this is going to require a lot of fixes. Can I just go back to what I did at the start because this change is otherwise gonna affect 50 test files, many of which regressed and should be addressed in another PR.

topperc · 2025-06-17T20:58:17Z

This patch gets what you want with only 1 test update. So it seems like only handling the (a + 1) == 0 case doesn't require a lot.

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 084e7fbaa268..e6542223d695 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -3160,7 +3160,7 @@ Instruction *InstCombinerImpl::foldICmpAddConstant(ICmpInst &Cmp,
       return replaceInstUsesWith(Cmp, Cond);
   }
   const APInt *C2;
-  if (Cmp.isEquality() || !match(Y, m_APInt(C2)))
+  if (!match(Y, m_APInt(C2)))
     return nullptr;
 
   // Fold icmp pred (add X, C2), C.
@@ -3184,7 +3184,7 @@ Instruction *InstCombinerImpl::foldICmpAddConstant(ICmpInst &Cmp,
       return new ICmpInst(Pred, X, ConstantInt::get(Ty, NewC));
   }
 
-  if (ICmpInst::isUnsigned(Pred) && Add->hasNoSignedWrap() &&
+  if (!Cmp.isEquality() && ICmpInst::isUnsigned(Pred) && Add->hasNoSignedWrap() &&
       C.isNonNegative() && (C - *C2).isNonNegative() &&
       computeConstantRange(X, /*ForSigned=*/true).add(*C2).isAllNonNegative())
     return new ICmpInst(ICmpInst::getSignedPredicate(Pred), X,
@@ -3205,6 +3205,9 @@ Instruction *InstCombinerImpl::foldICmpAddConstant(ICmpInst &Cmp,
       return new ICmpInst(ICmpInst::ICMP_UGE, X, ConstantInt::get(Ty, Lower));
   }
 
+  if (Cmp.isEquality())
+    return nullptr;
+
   // This set of folds is intentionally placed after folds that use no-wrapping
   // flags because those folds are likely better for later analysis/codegen.
   const APInt SMax = APInt::getSignedMaxValue(Ty->getScalarSizeInBits());
diff --git a/llvm/test/Transforms/InstCombine/uaddo.ll b/llvm/test/Transforms/InstCombine/uaddo.ll
index ae7a07ec8000..89a9569bdb7c 100644
--- a/llvm/test/Transforms/InstCombine/uaddo.ll
+++ b/llvm/test/Transforms/InstCombine/uaddo.ll
@@ -158,7 +158,7 @@ define i1 @uaddo_1(i8 %x, ptr %p) {
 ; CHECK-LABEL: @uaddo_1(
 ; CHECK-NEXT:    [[A:%.*]] = add i8 [[X:%.*]], 1
 ; CHECK-NEXT:    store i8 [[A]], ptr [[P:%.*]], align 1
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[A]], 0
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[X]], -1
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = add i8 %x, 1

github-actions · 2025-06-17T21:23:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…when we can compute the range If there are multiple uses of an add, we can fold a comparison anyway if we can compute a constant range for it, which should happen in cases such as saturated add.

AZero13 · 2025-06-18T15:47:53Z

@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR

topperc · 2025-06-19T00:11:09Z

@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR

Ok. Is there a known workload that benefits from this?

AZero13 · 2025-06-19T00:27:37Z

@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR

Ok. Is there a known workload that benefits from this?

Plenty. Let me ask @dtcxzyw

AZero13 · 2025-06-19T00:28:44Z

Actually @topperc We can merge this, but the code you modified does not seem obvious in what it is doing with the equality. But I will address this once I do multiple PRs that address the root cause because this cannot be done in one PR.

topperc · 2025-06-19T00:40:27Z

@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR

Ok. Is there a known workload that benefits from this?

Plenty. Let me ask @dtcxzyw

X86/RISCV/ARM/AArch64 all generates the same code for these two IR functions. Do I need a larger test case to show improved code generation?

define i32 @add_check_zero(i32 %num) {                                           
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
  %cond = select i1 %cmp, i32 -1, i32 %add
  ret i32 %cond
}
                                                                                 
define i32 @add_check_zero_opt(i32 %num) {
  %a = call i32 @llvm.uadd.sat.i32(i32 %num, i32 1)
  ret i32 %a
}

AZero13 · 2025-06-19T00:44:56Z

@topperc this is having too many side effects. I think we should do what I originally did for now and then I will invest working on a proper fix because this is not going to be easily done in one PR

Ok. Is there a known workload that benefits from this?

Plenty. Let me ask @dtcxzyw

X86/RISCV/ARM/AArch64 all generates the same code for these two IR functions. Do I need a larger test case to show improved code generation?
define i32 @add_check_zero(i32 %num) {                                           
  %add = add i32 %num, 1
  %cmp = icmp eq i32 %add, 0
  %cond = select i1 %cmp, i32 -1, i32 %add
  ret i32 %cond
}
                                                                                 
define i32 @add_check_zero_opt(i32 %num) {
  %a = call i32 @llvm.uadd.sat.i32(i32 %num, i32 1)
  ret i32 %a
}

It's more about the IR optimization, not just codegen.

AZero13 · 2025-06-19T00:46:35Z

But in all honesty, I think your solution of what we have now is good for now. I will work on a proper solution. but the side effects of your solution are not that bad. at worst they may result in AArch64 having an extra cmp because it won't emit branch-if-zero instructions sometimes but I will just address those separately since I feel like the codegen lowering should address this anyway.

AZero13 requested a review from nikic as a code owner June 17, 2025 17:04

llvmbot added llvm:instcombine llvm:transforms labels Jun 17, 2025

AZero13 force-pushed the sub-add branch 3 times, most recently from c00d050 to 978106e Compare June 17, 2025 18:29

llvmbot added llvm:ir llvm:analysis labels Jun 17, 2025

AZero13 force-pushed the sub-add branch 2 times, most recently from 978106e to 8a12f91 Compare June 17, 2025 20:50

AZero13 force-pushed the sub-add branch from 8a12f91 to 932b5aa Compare June 17, 2025 21:21

AZero13 changed the title ~~[InstCombine] Canonicalize (a + 1 == 0) ? -1 : a + 1 -> uadd.sat(a, 1)~~ Allow folding icmp eq (add X, C2), C when there is more than one-use when we can compute the range Jun 17, 2025

[InstCombine] Pre-commit test

1c0705f

AZero13 force-pushed the sub-add branch from 932b5aa to 87ca693 Compare June 17, 2025 22:19

Allow folding icmp eq (add X, C2), C when there is more than one-use …

12e1f40

…when we can compute the range If there are multiple uses of an add, we can fold a comparison anyway if we can compute a constant range for it, which should happen in cases such as saturated add.

AZero13 force-pushed the sub-add branch from 87ca693 to 12e1f40 Compare June 17, 2025 22:19

Allow folding icmp eq (add X, C2), C when there is more than one-use when we can compute the range #144566

Are you sure you want to change the base?

Allow folding icmp eq (add X, C2), C when there is more than one-use when we can compute the range #144566

Conversation

AZero13 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

topperc commented Jun 17, 2025

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

topperc commented Jun 17, 2025

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

AZero13 commented Jun 17, 2025

Uh oh!

topperc commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AZero13 commented Jun 18, 2025

Uh oh!

topperc commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AZero13 commented Jun 19, 2025

Uh oh!

AZero13 commented Jun 19, 2025

Uh oh!

topperc commented Jun 19, 2025

Uh oh!

AZero13 commented Jun 19, 2025

Uh oh!

AZero13 commented Jun 19, 2025

Uh oh!

Uh oh!

AZero13 commented Jun 17, 2025 •

edited

Loading

llvmbot commented Jun 17, 2025 •

edited

Loading

topperc commented Jun 17, 2025 •

edited

Loading

topperc commented Jun 17, 2025 •

edited

Loading

github-actions bot commented Jun 17, 2025 •

edited

Loading

topperc commented Jun 19, 2025 •

edited

Loading