sweep: fix expected spending events being missed #10060

yyforyongyu · 2025-07-09T12:34:28Z

Fix the issue #10051. What happened there was,

a sweeping tx was created during a restart, which put CommitmentAnchor and CommitmentTimeLock in the same group, while the anchor has already been spent.
This is usually fine as we would detect that the anchor input is spent, and retry sweeping the to_local output.

However, from the logs there, the spending event was not notified quickly enough here, causing us to think there's no spent of the anchor input,

lnd/sweep/fee_bumper.go

Lines 1433 to 1453 in ea32aac

    
           	// Do a non-blocking read to see if the output has been spent. 
        
           	select { 
        
           	case spend, ok := <-spendEvent.Spend: 
        
           		if !ok { 
        
           			log.Debugf("Spend ntfn for %v canceled", op) 
        
           			continue 
        
           		} 
        
           		spendingTx := spend.SpendingTx 
        
           		log.Debugf("Detected spent of input=%v in tx=%v", op, 
        
           			spendingTx.TxHash()) 
        
           		spentInputs[op] = spendingTx 
        
           	// Move to the next input. 
        
           	default: 
        
           		log.Tracef("Input %v not spent yet", op) 
        
           	} 
        
           }

We now fix it by calling GetUtxo first to check whether a given input is spent or not, and if it is, we will then do a block reading on the spending notification to receive a spending event.

gemini-code-assist · 2025-07-09T12:36:14Z

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

yyforyongyu · 2025-07-09T12:44:28Z

/gemini summary

gemini-code-assist · 2025-07-09T12:44:55Z

Summary of Changes

This pull request aims to resolve a critical issue in the sweep module where expected spending events for transaction inputs could be missed, particularly after a restart. The core change involves enhancing the input spending detection mechanism by first querying the blockchain for UTXO status. If an input is found to be already spent, the system now expects an immediate notification, improving the reliability of the sweeping process and preventing erroneous retries.

Highlights

Spending Event Reliability: Introduced a proactive check using GetUtxo to determine if a sweep input is already spent before registering for a spend notification. This addresses a race condition where spending events might be missed during restarts, leading to incorrect assumptions about input status.
Immediate Notification Expectation: Modified the getSpentInputs logic to expect an immediate spending event notification if GetUtxo indicates an input is already spent. A timeout has been added to log an error if this expectation is not met, highlighting potential issues with the notification system.
Dependency Injection: Added ChainIO (an abstraction for blockchain queries) to TxPublisherConfig, allowing the fee_bumper to directly query UTXO status.
Test Coverage & Fixes: Updated various test cases in fee_bumper_test.go to mock the new GetUtxo behavior, ensuring the logic for handling spent and unspent inputs is correctly tested. Several minor typos in comments and test function names were also corrected.

Changelog

sweep/fee_bumper.go
- Added time package import and defined spentNotificationTimeout constant.
- Corrected typo from sentinalEvent to sentinelEvent.
- Introduced ChainIO lnwallet.BlockChainIO to TxPublisherConfig for blockchain query capabilities.
- Implemented a GetUtxo check for each input in getSpentInputs to determine if it's already spent before registering a spend notification.
- Modified the select statement for spend events to perform a blocking read with a timeout, expecting immediate notification if GetUtxo indicates the input is spent.
- Corrected various minor typos in comments (e.g., differetly to differently, suing to using).
sweep/fee_bumper_test.go
- Imported lnmock package for mocking chain operations.
- Updated mockers struct and createTestPublisher to include and initialize MockChain for ChainIO.
- Corrected typo from sentinalEvent to sentinelEvent in TestBumpResultValidate.
- Renamed test functions TestCreateAnPublishFail to TestCreateAndPublishFail and TestCreateAnPublishSuccess to TestCreateAndPublishSuccess for consistency.
- Adjusted TestProcessRecordsInitial and TestProcessRecordsFeeBump to mock GetUtxo returning an unspent UTXO, skipping spend notification registration.
- Adjusted TestProcessRecordsInitialSpent, TestProcessRecordsConfirmed, and TestProcessRecordsSpent to mock GetUtxo returning a spent UTXO, proceeding with spend notification registration.
- Updated TestHasInputsSpent to include GetUtxo mocks for various input states and changed SignDesc mock expectation from Once() to Twice() for one input due to the new GetUtxo call.
- Corrected various minor typos in comments (e.g., alaways to always).

Activity

The Gemini bot encountered an error while attempting to create a summary (Comment 1).
The author, @yyforyongyu, re-triggered the Gemini bot for a summary (Comment 2).
There has been no human review activity or comments on this pull request yet.

yyforyongyu · 2025-07-09T12:49:21Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a fix to prevent missing expected spending events by checking UTXOs before registering spend notifications. The changes include adding a GetUtxo call, handling its results, and updating tests. I've provided feedback on potential panic scenarios and logging improvements.

sweep/fee_bumper.go

sweep/fee_bumper_test.go

sweep/fee_bumper.go

yyforyongyu · 2025-07-09T18:40:10Z

/gemini review

gemini-code-assist

Code Review

This pull request fixes a race condition where a spent input could be missed by the sweeper. The use of GetUtxo to verify the spent status is a good solution, and the test updates are comprehensive. The timeout handling in getSpentInputs can be improved for better correctness and resilience.

sweep/fee_bumper.go

yyforyongyu · 2025-07-09T18:44:50Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a bug where spending events for swept inputs could be missed, by first checking the UTXO status with GetUtxo. The logic is sound and the implementation is robust, including a timeout for spend notifications. The test suite has been updated accordingly. My main feedback is to adjust the log severity for certain error conditions from Critical to Warn to better reflect their nature and avoid unnecessary operational alerts.

sweep/fee_bumper.go

We now first check whether a given input has been found in the confirmed utxo set or not. When it's found, we can skip waiting for its spending notification.

This commit now makes the reading of spent events blocking. This is needed to make sure we won't miss a spent event for a spent input. Given when an input is spent, a spent event is returned immediately, this reading actually doesn't block, as by this point, we know for sure the input has been spent via `GetUtxo` check.

Abdulkbk

Nice, had an initial pass and left some questions

Abdulkbk · 2025-07-11T16:39:35Z

sweep/fee_bumper.go

@@ -1415,6 +1420,38 @@ func (t *TxPublisher) getSpentInputs(
 				"%v", op, heightHint)
 		}

+		// Check whether the input has been spent or not.


So, the GetUtxo call is probably just added here to save us time? because I noticed RegisterSpendNtfn also checks this internally.

yeah correct, it creates a shortcut here so we don't need to make unnecessary subscriptions. We only attempt to subscribe for spending when we know it's not in the utxo set, which means either the input has been spent or it's an orphan.

Abdulkbk · 2025-07-11T17:41:36Z

sweep/fee_bumper.go

@@ -1424,7 +1461,7 @@ func (t *TxPublisher) getSpentInputs(
 			log.Criticalf("Failed to register spend ntfn for "+
 				"input=%v: %v", op, err)

-			return nil
+			return spentInputs


So initially we return nil, and looking at the 2 instances this method is used, there is a check for the length of what was returned if len(spends) == 0 {. That would have caused LND to panic, right?.

A follow-up question is: what happens when we have multiple inputs (I guess that's a possibility), and one fails? Does that affect where we call the method since no error will be returned, and the only check I see is for the length of the returned result?

Returning nil here actually returns an empty map, so the nil is actually a zero-value map, thus calling len won't panic.

what happens when we have multiple inputs (I guess that's a possibility), and one fails?

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

Ah, I now understand that Criticalf sends a shutdown request after logging the error.

bitromortac

Looks good on initial pass 🙏

bitromortac · 2025-07-22T07:44:44Z

sweep/fee_bumper_test.go

+	m.chainIO.On("GetUtxo",
+		&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
+		mock.Anything,
+	).Return(&wire.TxOut{}, nil).Once()

 	// Create a monitor record that's not confirmed. We know it's not
 	// confirmed because the `SpendEvent` is empty.


is this comment now misleading?

bitromortac · 2025-07-22T07:51:38Z

sweep/fee_bumper_test.go

+	m.chainIO.On("GetUtxo",
+		&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
+		mock.Anything,
+	).Return(nil, nil).Once()


would it be useful to also return btcwallet.ErrOutputSpent for more realistic testing?

bitromortac · 2025-07-23T12:51:05Z

sweep/fee_bumper.go

+		// is spent or not. A better approach is to implement a new
+		// synchronous method to check for spending, which should be
+		// attempted when implementing SQL into btcwallet.
+		case <-time.After(spentNotificationTimeout):


I think the assumption here is not quite right, a spend event from RegisterSpendNtfn may arrive only very much later, since it may be doing a historical rescan for the output (and that is done from the current height back to the height hint, which can take a long time if the node was offline for some time and a force close happened in between). The same holds for the call in monitorSpend, not sure if that is problematic for the sweeper if there's a long delay between publish and spend notification.

Why do we need the spending transactions here, it looks like this is only used for logging/sanity checks, right? The docstring on r.spentInputs seems to also be misleading because all the spends may have been from the sweep transaction, I think.

yyforyongyu added this to the v0.20.0 milestone Jul 9, 2025

yyforyongyu self-assigned this Jul 9, 2025

yyforyongyu added the bug fix label Jul 9, 2025

yyforyongyu added this to lnd v0.20 Jul 9, 2025

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

sweep/fee_bumper_test.go Show resolved Hide resolved

sweep/fee_bumper.go Show resolved Hide resolved

sweep/fee_bumper.go Outdated Show resolved Hide resolved

saubyk moved this to In progress in lnd v0.20 Jul 9, 2025

saubyk added the utxo sweeping label Jul 9, 2025

yyforyongyu force-pushed the fix-sweep branch from 2f4e850 to 46496a2 Compare July 9, 2025 18:38

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

yyforyongyu force-pushed the fix-sweep branch from 46496a2 to 9120d9c Compare July 9, 2025 18:44

gemini-code-assist bot reviewed Jul 9, 2025

View reviewed changes

sweep/fee_bumper.go Outdated Show resolved Hide resolved

sweep/fee_bumper.go Show resolved Hide resolved

yyforyongyu force-pushed the fix-sweep branch 3 times, most recently from d15e867 to f652051 Compare July 10, 2025 00:49

yyforyongyu added 5 commits July 11, 2025 10:11

sweep: check utxo set before registering for spending notification

940d317

We now first check whether a given input has been found in the confirmed utxo set or not. When it's found, we can skip waiting for its spending notification.

sweep: fix unit tests

96a6857

sweep: fix typos

5502d91

sweep: add timeout when waiting for spending event

ea6c132

yyforyongyu force-pushed the fix-sweep branch from f652051 to ea6c132 Compare July 11, 2025 07:11

Abdulkbk reviewed Jul 11, 2025

View reviewed changes

yyforyongyu mentioned this pull request Jul 21, 2025

[bug]: pendingsweeps empty after #10053

Open

bitromortac reviewed Jul 22, 2025

View reviewed changes

saubyk linked an issue Jul 22, 2025 that may be closed by this pull request

[bug]: pendingsweeps empty after #10053

Open

saubyk removed this from lnd v0.20 Jul 22, 2025

saubyk requested a review from Roasbeef July 22, 2025 16:55

bitromortac reviewed Jul 23, 2025

View reviewed changes

	// Do a non-blocking read to see if the output has been spent.
	select {
	case spend, ok := <-spendEvent.Spend:
	if !ok {
	log.Debugf("Spend ntfn for %v canceled", op)

	continue
	}

	spendingTx := spend.SpendingTx

	log.Debugf("Detected spent of input=%v in tx=%v", op,
	spendingTx.TxHash())

	spentInputs[op] = spendingTx

	// Move to the next input.
	default:
	log.Tracef("Input %v not spent yet", op)
	}
	}

sweep: fix expected spending events being missed #10060

Are you sure you want to change the base?

sweep: fix expected spending events being missed #10060

Uh oh!

Conversation

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot commented Jul 9, 2025

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot commented Jul 9, 2025

Summary of Changes

Highlights

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yyforyongyu commented Jul 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Abdulkbk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bitromortac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!