Skip to content

Conversation

hanabi1224
Copy link
Contributor

Benchmarks

➜  hamt git:(master) cargo bench
...
HAMT bulk insert (no flush)
                        time:   [7.0920 µs 7.1164 µs 7.1423 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild

HAMT bulk insert with flushing and loading
                        time:   [2.0349 ms 2.0411 ms 2.0475 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

HAMT deleting all nodes time:   [106.66 µs 107.20 µs 107.73 µs]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

HAMT for_each function  time:   [102.49 µs 102.76 µs 103.08 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

➜  hamt git:(hm/hamt-cacheless-iter) cargo bench
...
HAMT bulk insert (no flush)
                        time:   [7.0839 µs 7.1202 µs 7.1733 µs]
                        change: [-0.3421% +0.2993% +1.0198%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  7 (7.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

HAMT bulk insert with flushing and loading
                        time:   [1.9895 ms 1.9947 ms 2.0004 ms]
                        change: [-2.6603% -2.2702% -1.8510%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

HAMT deleting all nodes time:   [104.54 µs 104.86 µs 105.21 µs]
                        change: [-1.9633% -1.4104% -0.8534%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

HAMT for_each function  time:   [100.01 µs 100.33 µs 100.69 µs]
                        change: [-2.8585% -2.3226% -1.7285%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

@github-project-automation github-project-automation bot moved this to 📌 Triage in FilOz Sep 2, 2025
@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2025

Codecov Report

❌ Patch coverage is 88.69565% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.61%. Comparing base (fe4d5c1) to head (e5c9ea9).

Files with missing lines Patch % Lines
ipld/hamt/src/iter.rs 87.09% 12 Missing ⚠️
ipld/hamt/src/pointer.rs 88.88% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2215      +/-   ##
==========================================
+ Coverage   77.56%   77.61%   +0.04%     
==========================================
  Files         147      147              
  Lines       15789    15872      +83     
==========================================
+ Hits        12247    12319      +72     
- Misses       3542     3553      +11     
Files with missing lines Coverage Δ
ipld/hamt/src/hamt.rs 97.16% <100.00%> (+0.01%) ⬆️
ipld/hamt/src/lib.rs 100.00% <ø> (ø)
ipld/hamt/src/node.rs 91.46% <100.00%> (+0.16%) ⬆️
ipld/hamt/src/pointer.rs 84.49% <88.88%> (-0.51%) ⬇️
ipld/hamt/src/iter.rs 89.31% <87.09%> (-3.00%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hanabi1224 hanabi1224 force-pushed the hm/hamt-cacheless-iter branch from 494060d to dad03d3 Compare September 2, 2025 03:20
@@ -1,7 +1,7 @@
[package]
name = "fvm_ipld_hamt"
description = "Sharded IPLD HashMap implementation."
version = "0.10.4"
version = "0.11.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this as a separate PR

@rvagg
Copy link
Member

rvagg commented Sep 2, 2025

"bulk insert" - how big are we talking here? if you make it huge you ought to see a larger difference in the iteration I think, although this really does come down to memory retention and management, so measuring that would be more interesting if that were at all possible

@rvagg
Copy link
Member

rvagg commented Sep 2, 2025

it'd be better if we didn't go breaking the existing API, can you not easily do a similar approach as in #2189 of adding a new iteration method for this?

@BigLep BigLep moved this from 📌 Triage to ⌨️ In Progress in FilOz Sep 2, 2025
@hanabi1224
Copy link
Contributor Author

hanabi1224 commented Sep 2, 2025

it'd be better if we didn't go breaking the existing API, can you not easily do a similar approach as in #2189 of adding a new iteration method for this?

Hey @rvagg Thanks for your quick review! I will use another PR to implement fn for_each_cacheless to avoid making breaking API changes. I will keep this PR as a draft since it makes Forest integration much easier with minimal changes (almost drop-in).

I'd like to discuss whether we plan to adopt this PR in the next minor version release and deprecate fn for_each_cacheless (same in fvm_ipld_amt) after the performance benefits are well tested and confirmed, because adopting the new fn for_each_cacheless API in Forest requires a lot of changes and the current cache-aware fn for_each does not seem to be beneficial to my understanding. cc @LesnyRumcajs

@LesnyRumcajs
Copy link
Contributor

API in Forest requires a lot of changes and the current cache-aware fn for_each does not seem to be beneficial to my understanding

It might not be beneficial in the current Forest logic, but some users might depend on the caching for their use cases, so this is a breaking change. Refer to Hyrum's Law.

We might think about switching the backend to cacheless one eventually, but we'd need to ensure the benefits significantly outweigh potential risks. I'm especially concerned about potential performance degradation in builtin-actors on mainnet (but other consumers of the fvm_ipld_* are also important).

@rvagg
Copy link
Member

rvagg commented Sep 3, 2025

This is going to have to be case-by-case, which is why it would be nice to have an opt-in form of this. I did originally imagine having some constructor option that would give you one that cached or didn't cache, but this is much more a question of idiomatic Rust and Rust ergonomics which I'm not necessarily the best person to give advise on.

But I do want to do this for Go to, and, I know its utility is mainly in certain areas, particularly where we know we're doing iteration on large data and we know that we're only iterating. Migrations is the big one for Go at least, we iterate through actors, we don't need random access, we drop it on the floor when we're done, the cache just gets in the way. I'm sure there's a number of APIs we serve too where it's pure straight iteration too where it would be helpful. But in the case where you instantiate one of these things and pass it around for general use - get, set, iterate, then a cache could be quite helpful.

@hanabi1224 hanabi1224 changed the title feat: cacheless hamt iteration [Testing Only] cacheless hamt iteration Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ⌨️ In Progress
Development

Successfully merging this pull request may close these issues.

4 participants