Skip to content

Conversation

benthecarman
Copy link
Contributor

Closes #3526

For a marginal increase in privacy, we can randomize the inputs from the OutputSweeper. Since we don't depend on rand to randomize the order I just put the elements into a hashset, then back into a Vec. This should give us enough randomness without having to introduce a new dep or make the OutputSweeper depend on the EntropySource.

For a marginal increase in privacy, we can randomize the inputs from the
`OutputSweeper`. Since we don't depend on `rand` to randomize the order
I just put the elements into a hashset, then back into a Vec. This
should give us enough randomness without having to introduce a new dep
or make the `OutputSweeper` depend on the `EntropySource`.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 29, 2025

I've assigned @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from tnull August 29, 2025 00:11
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.74%. Comparing base (a9bbb24) to head (8a84ed0).
⚠️ Report is 40 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4033   +/-   ##
=======================================
  Coverage   88.74%   88.74%           
=======================================
  Files         176      176           
  Lines      128845   128847    +2     
  Branches   128845   128847    +2     
=======================================
+ Hits       114340   114342    +2     
+ Misses      11912    11909    -3     
- Partials     2593     2596    +3     
Flag Coverage Δ
fuzzing 21.93% <0.00%> (-0.01%) ⬇️
tests 88.57% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT

Copy link
Member

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was digging into the difference between using EntropySource-based shuffling versus the current Vec → HashSet → Vec approach, and it seems like the latter might not be the best fit for what we’re aiming to do.

On Randomness:

  • It’s not truly random. There’s no guarantee that all permutations are equally likely, which matters if the goal is to introduce meaningful entropy (for example, for privacy).
  • Since the order comes from internal hashing behavior, there’s no way to seed or reproduce the order for testing.
  • It replaces a well-understood and tested shuffling algorithm with a non-deterministic, opaque alternative. This might behave unpredictably across platforms or versions, which could potentially introduce subtle bugs over time.

If randomness does matter here, maybe it’s worth considering introducing EntropySource into OutputSweeper as a longer-term solution.

On Deduplication:

Also wondering: is it worth introducing an additional allocation (via HashSet) just to remove duplicates? If duplicates are actually a problem, maybe we can address that more explicitly, especially considering the relatively small size of the input in most cases.

Happy to discuss further if I’m missing context. Just wanted to flag these points as they came up while reviewing.

@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with previous reviewers that we should probably just take on an EntropySource argument to OutputSweeper, especially since we're gonna need it when doing 'proper' batch randomization going forward.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Sep 1, 2025

I disagree here, I think using the default HashMap randomization is perfectly acceptable for randomizing order (when its not critical). It may not be perfect to pay the allocation tax, though.

To correct the original misconceptions:

It’s not truly random. There’s no guarantee that all permutations are equally likely, which matters if the goal is to introduce meaningful entropy (for example, for privacy).

It may not be guaranteed, no, but it is in practice. In practice the values are SipHash-1-3 of the values, seeded with an actual random input from getrandom. That should make everything equally likely, as far as we're concerned.

especially since we're gonna need it when doing 'proper' batch randomization going forward.

Wait, what is "proper" batch randomization?

@tnull
Copy link
Contributor

tnull commented Sep 2, 2025

Wait, what is "proper" batch randomization?

Currently, OutputSweeper in each broadcast round still attempts to batch all tracked outputs, which has some potential for all sweeps getting stuck if there's at least one unspendable output in there. I still want to implement randomized subset sampling, possibly tracking when which output spend was attempted last, to over time guarantee we'd eventually spend all spendable outputs, even if there are some tracked that are currently unspendable.

@TheBlueMatt
Copy link
Collaborator

I don't see why that needs a "real" RNG either? Just a counter and hash would suffice for that (given the behavior is, first, "include all", then do random inclusion sets).

@tnull
Copy link
Contributor

tnull commented Sep 4, 2025

I don't see why that needs a "real" RNG either? Just a counter and hash would suffice for that (given the behavior is, first, "include all", then do random inclusion sets).

Yeah, we don't necessarily need a 'real' RNG for that, that's true.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gonna move forward with this, we can still improve things further when revisiting batch randomization.

@tnull tnull merged commit ec54ea0 into lightningdevkit:main Sep 4, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OutputSweeper sweeps in historical order
6 participants