Skip to content

Conversation

agaudreault
Copy link
Member

@agaudreault agaudreault commented Aug 12, 2025

Closes argoproj/argo-cd#23226

Description

The new finalizer on hooks allow us to observe the hook state before Kubernetes tries to delete it. Multiple issue where found causing the finalizers not to be properly removed on hooks managed by the sync.

  • When using SyncFail phase hooks, if any hook fail to apply (sometimes happens with Replace=true or bad manifest), the finalizers will not be removed and the sync will be failed with ongoing running tasks. This is a problem in the gitops-engine logic.
  • When any resource fail to apply during a phase, the finalizers wont be removed from the existing hooks of that phase
  • When there is a "sync wave hook" error, the finalizers wont be removed from the existing hooks of that phase

Changes

  • ensure hook finalizers are always removed so hooks don’t get stuck by removing finalizers on any existing hook task before creating/starting a new one, and ensure the finalizer is removed before deleting BeforeHookCreation hooks.
  • Don’t preemptively fail a sync if failure hooks fail to apply. Wait for the failure hooks to complete before Failing. Otherwise, Argo CD will immediately retry while the SyncFail hooks are running.
  • Terminate ongoing hooks on errors/terminate
  • addresses a race between BeforeHookCreation and Replace=true.
  • Clarify docs
  • Add test for all finalizers termination case

Tests

  • Test sync all Delete policies & finalizers removed
  • Test sync existing hook with finalizers succeeds
  • Test sync running + terminate remove all finalizers and running hooks
  • Test sync fail + syncFailTasks
  • Test sync fail + syncFailTasks fail
  • test syncWaveHook fails with syncFailTasks
  • test race condition with BeforeHookCreation and Replace=true

Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 73.20574% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.32%. Comparing base (8849c3f) to head (ef00d91).
⚠️ Report is 57 commits behind head on master.

Files with missing lines Patch % Lines
pkg/sync/sync_context.go 79.27% 30 Missing and 10 partials ⚠️
...kg/utils/kube/kubetest/mock_resource_operations.go 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #754      +/-   ##
==========================================
- Coverage   54.26%   48.32%   -5.95%     
==========================================
  Files          64       64              
  Lines        6164     6639     +475     
==========================================
- Hits         3345     3208     -137     
- Misses       2549     3170     +621     
+ Partials      270      261       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move functions importing sync/common to a helper since they cannot be in utils/testing.
sync/common imports utils/kube for a type, utils/kube tests use utils/testing which would use sync/common creating a cycle.

@@ -439,7 +457,7 @@ func (sc *syncContext) Sync() {
// No need to perform a dry-run on the namespace creation, because if it fails we stop anyway
sc.log.WithValues("task", nsCreateTask).Info("Creating namespace")
if sc.runTasks(nsSyncTasks, false) == failed {
sc.setOperationFailed(syncTasks{}, nsSyncTasks, "the namespace failed to apply")
sc.executeSyncFailPhase(syncTasks{}, nsSyncTasks, "the namespace failed to apply")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setOperationFailed was renamed to executeSyncFailPhase

} else {
sc.setResourceResult(task, "", operationState, message)
sc.setResourceResult(task, task.syncStatus, operationState, message)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always preserve the current syncStatus in resource results instead of deleting it.

// delete all completed hooks which have appropriate delete policy
sc.deleteHooks(hooksPendingDeletionSuccessful)
sc.setOperationPhase(common.OperationSucceeded, "successfully synced (all tasks run)")
} else {
sc.setRunningPhase(remainingTasks, false)
sc.setRunningPhase(tasks, false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the incorrect message. When we are running, we care about the current running task, not the remaining tasks that are still pending.

Signed-off-by: Alexandre Gaudreault <[email protected]>
}
}

func (sc *syncContext) GetState() (common.OperationPhase, string, []common.ResourceSyncResult) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved GetState(), no changes

Signed-off-by: Alexandre Gaudreault <[email protected]>
terminateSuccessful := true
sc.log.V(1).Info("terminating")
tasks, _ := sc.getSyncTasks()
for _, task := range tasks {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of terminateHooksPreemptively now

@@ -687,40 +793,16 @@ func (sc *syncContext) removeHookFinalizer(task *syncTask) error {
})
}

func (sc *syncContext) updateResource(task *syncTask) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@agaudreault agaudreault marked this pull request as ready for review August 20, 2025 21:04
@agaudreault agaudreault requested a review from a team as a code owner August 20, 2025 21:04
Co-authored-by: Dan Garfield <[email protected]>
Copy link

sonarqubecloud bot commented Sep 2, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArgoCD fails on hook-finalizer (after upgrade to 3.x)
2 participants