Skip to content

[VPlan] Materialize constant vector trip counts before final opts. #142309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jun 1, 2025

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.

@llvmbot
Copy link
Member

llvmbot commented Jun 1, 2025

@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-backend-systemz

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.


Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
  • (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
  • (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
  • (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
  • (modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
  • (modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
  • (modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
  • (modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
  • (modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
  • (modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
  • (modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 1, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.


Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
  • (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
  • (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
  • (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
  • (modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
  • (modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
  • (modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
  • (modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
  • (modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
  • (modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
  • (modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 1, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Florian Hahn (fhahn)

Changes

Materialize constant vector trip counts before ::execute, if the trip count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle block.

For now this is also only done when not vectorizing the epilogue, as the simplification complicates stitching the 2 plans together.


Patch is 624.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142309.diff

165 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+4-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/invariant-replicate-region.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+16-21)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+5-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-constant-ops.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-metadata.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AMDGPU/packed-math.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-predselect.ll (+6-8)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/pointer_iv.ll (+24-24)
  • (modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+42-42)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse-output.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+17-17)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vf-will-not-generate-any-vector-insts.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/addressing.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-constant-known-via-scev.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+5-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-cost.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaving.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+51-51)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+45-45)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr109581-unused-blend.ll (+4-18)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr131359-dead-for-splice.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-uniform-call.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll (+27-31)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-interleaved-accesses-gap.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/blend-in-header.ll (+12-16)
  • (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder-infer-correct-gepty.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/constantfolder.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+3-5)
  • (modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/debugloc-optimize-vfuf-term.ll (+13-13)
  • (modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+32-32)
  • (modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+39-46)
  • (modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-multiple-uses-in-same-instruction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+62-68)
  • (modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/interleave-with-i65-induction.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+20-24)
  • (modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+121-74)
  • (modified) llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-neg-off.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+28-29)
  • (modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+23-24)
  • (modified) llvm/test/Transforms/LoopVectorize/phi-cost.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/pr44488-predication.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr50686.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/pr55167-fold-tail-live-out.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/pr58811-scev-expansion.ll (+9-12)
  • (modified) llvm/test/Transforms/LoopVectorize/pr66616.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-min-max.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+18-18)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/remarks-reduction-inloop.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/reverse_induction.ll (+18-21)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-check.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-difference-simplifications.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-exit-phi-invalidation.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction-start-value-may-be-undef-or-poison.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/select-reduction.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/single_early_exit_with_outer_loop.ll (+3-9)
  • (modified) llvm/test/Transforms/LoopVectorize/tail-folding-optimize-vector-induction-width.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-extended-icmps.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-loads-p16.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-reductions.ll (+6-7)
  • (modified) llvm/test/Transforms/LoopVectorize/trunc-shifts.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/uitofp-preserve-nneg.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+38-38)
  • (modified) llvm/test/Transforms/LoopVectorize/unused-blend-mask-for-first-operand.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-outside-iv-users.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-gep-all-indices-invariant.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/widen-intrinsic.ll (+2-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index e9ace195684b3..67062f36c6986 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -7560,6 +7560,7 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::runPass(VPlanTransforms::materializeBroadcasts, BestVPlan);
   VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
   VPlanTransforms::simplifyRecipes(BestVPlan, *Legal->getWidestInductionType());
+  VPlanTransforms::removeBranchOnConst(BestVPlan);
   VPlanTransforms::narrowInterleaveGroups(
       BestVPlan, BestVF,
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector));
@@ -10536,6 +10537,25 @@ bool LoopVectorizePass::processLoop(Loop *L) {
         InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width,
                                VF.MinProfitableTripCount, IC, &CM, BFI, PSI,
                                Checks, BestPlan);
+
+        // Materialize vector trip counts for constants early if it can simply
+        // be computed as (Original TC / VF * UF) * VF * UF.
+        if (BestPlan.hasScalarTail() &&
+            !CM.requiresScalarEpilogue(VF.Width.isVector())) {
+          VPValue *TC = BestPlan.getTripCount();
+          if (TC->isLiveIn()) {
+            ScalarEvolution &SE = *PSE.getSE();
+            auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
+            const SCEV *VFxUF =
+                SE.getElementCount(TCScev->getType(), VF.Width * IC);
+            auto VecTCScev =
+                SE.getMulExpr(SE.getUDivExpr(TCScev, VFxUF), VFxUF);
+            if (auto *NewC = dyn_cast<SCEVConstant>(VecTCScev))
+              BestPlan.getVectorTripCount().setUnderlyingValue(
+                  NewC->getValue());
+          }
+        }
+
         LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
         ++LoopsVectorized;
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 165b57c87beb1..9a3a08a3e06a9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -941,7 +941,8 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
     BackedgeTakenCount->setUnderlyingValue(TCMO);
   }
 
-  VectorTripCount.setUnderlyingValue(VectorTripCountV);
+  if (!VectorTripCount.getUnderlyingValue())
+    VectorTripCount.setUnderlyingValue(VectorTripCountV);
 
   IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
   // FIXME: Model VF * UF computation completely in VPlan.
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index b214498229e52..d5f4807b0dfe7 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1842,9 +1842,7 @@ void VPlanTransforms::truncateToMinimalBitwidths(
   }
 }
 
-/// Remove BranchOnCond recipes with true or false conditions together with
-/// removing dead edges to their successors.
-static void removeBranchOnConst(VPlan &Plan) {
+void VPlanTransforms::removeBranchOnConst(VPlan &Plan) {
   using namespace llvm::VPlanPatternMatch;
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
            vp_depth_first_shallow(Plan.getEntry()))) {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 34e2de4eb3b74..653eeebe1531f 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -202,6 +202,10 @@ struct VPlanTransforms {
   /// CanonicalIVTy as type for all un-typed live-ins in VPTypeAnalysis.
   static void simplifyRecipes(VPlan &Plan, Type &CanonicalIVTy);
 
+  /// Remove BranchOnCond recipes with true or false conditions together with
+  /// removing dead edges to their successors.
+  static void removeBranchOnConst(VPlan &Plan);
+
   /// If there's a single exit block, optimize its phi recipes that use exiting
   /// IV values by feeding them precomputed end values instead, possibly taken
   /// one step backwards.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..ff3d43eb2d52f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -368,7 +368,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK-NEXT:    [[TMP71:%.*]] = icmp eq i32 [[INDEX_NEXT]], 96
 ; CHECK-NEXT:    br i1 [[TMP71]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 96, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -388,7 +388,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[LOOP_LATCH]]:
 ; CHECK-NEXT:    [[IV_NEXT]] = add i32 [[IV1]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 100
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..1cc4af7c4a3dd 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -32,7 +32,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    br i1 [[TMP14]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <2 x i64> [[WIDE_LOAD1]], i32 1
-; CHECK-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[SCALAR_PH]]
 ; CHECK:       [[SCALAR_PH]]:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 100, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    [[SCALAR_RECUR_INIT:%.*]] = phi i64 [ [[VECTOR_RECUR_EXTRACT]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -47,7 +47,7 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    store i64 [[OR]], ptr [[GEP_DST]], align 8
 ; CHECK-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
 ; CHECK-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 100
-; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EXITCOND_NOT]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret void
 ;
@@ -86,9 +86,9 @@ define void @powi_call(ptr %P) {
 ; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index f36161703dba5..35ad81fce40ad 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -680,7 +680,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], %[[MIDDLE_BLOCK]] ], [ [[DST]], %[[ENTRY]] ]
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL1:%.*]] = phi i64 [ 512, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
@@ -696,7 +696,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[PTR_IV_NEXT]] = getelementptr i8, ptr [[PTR_IV]], i64 8
 ; DEFAULT-NEXT:    [[IV_CLAMP:%.*]] = and i64 [[IV]], 4294967294
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_CLAMP]], 512
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]], !llvm.loop [[LOOP23:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
@@ -1492,7 +1492,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 16
 ; DEFAULT-NEXT:    br i1 [[TMP3]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP28:![0-9]+]]
 ; DEFAULT:       [[MIDDLE_BLOCK]]:
-; DEFAULT-NEXT:    br i1 false, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; DEFAULT-NEXT:    br label %[[SCALAR_PH]]
 ; DEFAULT:       [[SCALAR_PH]]:
 ; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 16, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; DEFAULT-NEXT:    br label %[[LOOP_HEADER:.*]]
@@ -1506,7 +1506,7 @@ define void @redundant_branch_and_tail_folding(ptr %dst, i1 %c) {
 ; DEFAULT-NEXT:    [[T:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
 ; DEFAULT-NEXT:    store i32 [[T]], ptr [[DST]], align 4
 ; DEFAULT-NEXT:    [[EC:%.*]] = icmp eq i64 [[IV_NEXT]], 21
-; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
+; DEFAULT-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP_HEADER]], !llvm.loop [[LOOP29:![0-9]+]]
 ; DEFAULT:       [[EXIT]]:
 ; DEFAULT-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 73ef8534bb66c..06e6306da2368 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -469,7 +469,7 @@ define void @old_and_new_size_equalko(ptr noalias %src, ptr noalias %dst) {
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP14:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT:    br label [[EXIT:%.*]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..6adb5470e1dc4 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -72,9 +72,9 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP34:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
 ; CHECK-NEXT:    br i1 [[TMP34]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1000, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_HEADER:.*]]
 ; CHECK:       [[LOOP_HEADER]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_LATCH:.*]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
index 403a5f186a8d6..98e52098d9766 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll
@@ -40,9 +40,9 @@ define void @fmin32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -121,9 +121,9 @@ define void @fmax32(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -202,9 +202,9 @@ define void @fmin64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -283,9 +283,9 @@ define void @fmax64(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP13]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -364,9 +364,9 @@ define void @fmin16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
@@ -445,9 +445,9 @@ define void @fmax16(ptr noundef readonly captures(none) %input1, ptr noundef rea
 ; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4096, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
index 095ac222e1789..0214c4188314f 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll
@@ -19,11 +19,11 @@ define double @test_reduction_costs() {
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
 ; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
 ; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ [[TMP1]], %[[MIDDLE_BLOCK]] ], [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX2:%.*]] = phi double [ 0.000000e+00, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP_1:.*]]
 ; CHECK:       [[LOOP_1]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP_1]] ]
@@ -103,11 +103,11 @@ define void @test_iv_cost(ptr %ptr.start, i8 %a, i64 %b) {
 ; CHECK-NEXT:    [[IND_END5:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[N_VEC3]]
 ; CHECK-NEXT:    br label %[[VEC_EPILOG_VECTOR_BODY:.*]]
 ; CHECK:       [[VEC_EPILOG_VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
-; CHECK-NEXT:    [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[PTR_START]], i64 [[INDEX]]
+; CHECK-NEXT:    [[INDEX4:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT10:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[NEXT_...
[truncated]

auto *TCScev = SE.getSCEV(TC->getLiveInIRValue());
const SCEV *VFxUF =
SE.getElementCount(TCScev->getType(), VF.Width * IC);
auto VecTCScev =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused - even before this patch surely the vector trip count must have already been set to (OriginalTC / (VF * VF)) * (VF * UF), otherwise the vectorised code wouldn't even work? I just wonder if you can simply check that the existing vector trip count is already a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially VectorTripCount is an opaque live-in VPValue and we can only compute it once we picked VF and UF. It is set just before executing the VPlan in VPlan::prepareToExecute, so it's available during codegen, but not during earlier VPlan transforms.

With this patch, it now gets set slightly earlier if it is constant so VPlan transform can use it for optimizations.

I updated VPlan::prepareToExecute to also assert the value set earlier matches.

fhahn added 2 commits June 9, 2025 22:56
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as
the simplification complicates stitching the 2 plans together.
@fhahn fhahn force-pushed the vplan-materialize-constant-vector-tc branch from 0129d1d to 4122565 Compare June 9, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants