Skip to content

sweep: fix expected spending events being missed #10060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yyforyongyu
Copy link
Member

Fix the issue #10051. What happened there was,

  • a sweeping tx was created during a restart, which put CommitmentAnchor and CommitmentTimeLock in the same group, while the anchor has already been spent.
  • This is usually fine as we would detect that the anchor input is spent, and retry sweeping the to_local output.

However, from the logs there, the spending event was not notified quickly enough here, causing us to think there's no spent of the anchor input,

lnd/sweep/fee_bumper.go

Lines 1433 to 1453 in ea32aac

// Do a non-blocking read to see if the output has been spent.
select {
case spend, ok := <-spendEvent.Spend:
if !ok {
log.Debugf("Spend ntfn for %v canceled", op)
continue
}
spendingTx := spend.SpendingTx
log.Debugf("Detected spent of input=%v in tx=%v", op,
spendingTx.TxHash())
spentInputs[op] = spendingTx
// Move to the next input.
default:
log.Tracef("Input %v not spent yet", op)
}
}

We now fix it by calling GetUtxo first to check whether a given input is spent or not, and if it is, we will then do a block reading on the spending notification to receive a spending event.

@yyforyongyu yyforyongyu added this to the v0.20.0 milestone Jul 9, 2025
@yyforyongyu yyforyongyu self-assigned this Jul 9, 2025
Copy link

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@yyforyongyu
Copy link
Member Author

/gemini summary

Copy link

Summary of Changes

This pull request aims to resolve a critical issue in the sweep module where expected spending events for transaction inputs could be missed, particularly after a restart. The core change involves enhancing the input spending detection mechanism by first querying the blockchain for UTXO status. If an input is found to be already spent, the system now expects an immediate notification, improving the reliability of the sweeping process and preventing erroneous retries.

Highlights

  • Spending Event Reliability: Introduced a proactive check using GetUtxo to determine if a sweep input is already spent before registering for a spend notification. This addresses a race condition where spending events might be missed during restarts, leading to incorrect assumptions about input status.
  • Immediate Notification Expectation: Modified the getSpentInputs logic to expect an immediate spending event notification if GetUtxo indicates an input is already spent. A timeout has been added to log an error if this expectation is not met, highlighting potential issues with the notification system.
  • Dependency Injection: Added ChainIO (an abstraction for blockchain queries) to TxPublisherConfig, allowing the fee_bumper to directly query UTXO status.
  • Test Coverage & Fixes: Updated various test cases in fee_bumper_test.go to mock the new GetUtxo behavior, ensuring the logic for handling spent and unspent inputs is correctly tested. Several minor typos in comments and test function names were also corrected.
Changelog
  • sweep/fee_bumper.go
    • Added time package import and defined spentNotificationTimeout constant.
    • Corrected typo from sentinalEvent to sentinelEvent.
    • Introduced ChainIO lnwallet.BlockChainIO to TxPublisherConfig for blockchain query capabilities.
    • Implemented a GetUtxo check for each input in getSpentInputs to determine if it's already spent before registering a spend notification.
    • Modified the select statement for spend events to perform a blocking read with a timeout, expecting immediate notification if GetUtxo indicates the input is spent.
    • Corrected various minor typos in comments (e.g., differetly to differently, suing to using).
  • sweep/fee_bumper_test.go
    • Imported lnmock package for mocking chain operations.
    • Updated mockers struct and createTestPublisher to include and initialize MockChain for ChainIO.
    • Corrected typo from sentinalEvent to sentinelEvent in TestBumpResultValidate.
    • Renamed test functions TestCreateAnPublishFail to TestCreateAndPublishFail and TestCreateAnPublishSuccess to TestCreateAndPublishSuccess for consistency.
    • Adjusted TestProcessRecordsInitial and TestProcessRecordsFeeBump to mock GetUtxo returning an unspent UTXO, skipping spend notification registration.
    • Adjusted TestProcessRecordsInitialSpent, TestProcessRecordsConfirmed, and TestProcessRecordsSpent to mock GetUtxo returning a spent UTXO, proceeding with spend notification registration.
    • Updated TestHasInputsSpent to include GetUtxo mocks for various input states and changed SignDesc mock expectation from Once() to Twice() for one input due to the new GetUtxo call.
    • Corrected various minor typos in comments (e.g., alaways to always).
Activity
  • The Gemini bot encountered an error while attempting to create a summary (Comment 1).
  • The author, @yyforyongyu, re-triggered the Gemini bot for a summary (Comment 2).
  • There has been no human review activity or comments on this pull request yet.

@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix to prevent missing expected spending events by checking UTXOs before registering spend notifications. The changes include adding a GetUtxo call, handling its results, and updating tests. I've provided feedback on potential panic scenarios and logging improvements.

@saubyk saubyk moved this to In progress in lnd v0.20 Jul 9, 2025
@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a race condition where a spent input could be missed by the sweeper. The use of GetUtxo to verify the spent status is a good solution, and the test updates are comprehensive. The timeout handling in getSpentInputs can be improved for better correctness and resilience.

@yyforyongyu
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where spending events for swept inputs could be missed, by first checking the UTXO status with GetUtxo. The logic is sound and the implementation is robust, including a timeout for spend notifications. The test suite has been updated accordingly. My main feedback is to adjust the log severity for certain error conditions from Critical to Warn to better reflect their nature and avoid unnecessary operational alerts.

@yyforyongyu yyforyongyu force-pushed the fix-sweep branch 3 times, most recently from d15e867 to f652051 Compare July 10, 2025 00:49
We now first check whether a given input has been found in the confirmed
utxo set or not. When it's found, we can skip waiting for its spending
notification.
This commit now makes the reading of spent events blocking. This is
needed to make sure we won't miss a spent event for a spent input. Given
when an input is spent, a spent event is returned immediately, this
reading actually doesn't block, as by this point, we know for sure the
input has been spent via `GetUtxo` check.
Copy link
Contributor

@Abdulkbk Abdulkbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, had an initial pass and left some questions

@@ -1415,6 +1420,38 @@ func (t *TxPublisher) getSpentInputs(
"%v", op, heightHint)
}

// Check whether the input has been spent or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the GetUtxo call is probably just added here to save us time? because I noticed RegisterSpendNtfn also checks this internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah correct, it creates a shortcut here so we don't need to make unnecessary subscriptions. We only attempt to subscribe for spending when we know it's not in the utxo set, which means either the input has been spent or it's an orphan.

@@ -1424,7 +1461,7 @@ func (t *TxPublisher) getSpentInputs(
log.Criticalf("Failed to register spend ntfn for "+
"input=%v: %v", op, err)

return nil
return spentInputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So initially we return nil, and looking at the 2 instances this method is used, there is a check for the length of what was returned if len(spends) == 0 {. That would have caused LND to panic, right?.

A follow-up question is: what happens when we have multiple inputs (I guess that's a possibility), and one fails? Does that affect where we call the method since no error will be returned, and the only check I see is for the length of the returned result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nil here actually returns an empty map, so the nil is actually a zero-value map, thus calling len won't panic.

what happens when we have multiple inputs (I guess that's a possibility), and one fails?

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean one fails? If there's a failure here, then we'd shut down lnd due to Criticalf.

Ah, I now understand that Criticalf sends a shutdown request after logging the error.

Copy link
Collaborator

@bitromortac bitromortac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on initial pass 🙏

m.chainIO.On("GetUtxo",
&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
mock.Anything,
).Return(&wire.TxOut{}, nil).Once()

// Create a monitor record that's not confirmed. We know it's not
// confirmed because the `SpendEvent` is empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment now misleading?

m.chainIO.On("GetUtxo",
&op, inp.SignDesc().Output.PkScript, inp.HeightHint(),
mock.Anything,
).Return(nil, nil).Once()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be useful to also return btcwallet.ErrOutputSpent for more realistic testing?

@saubyk saubyk linked an issue Jul 22, 2025 that may be closed by this pull request
@saubyk saubyk removed this from lnd v0.20 Jul 22, 2025
@saubyk saubyk requested a review from Roasbeef July 22, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: pendingsweeps empty after
4 participants