-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[refactor][ml] Replace cache eviction algorithm with centralized removal queue and job #24363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[refactor][ml] Replace cache eviction algorithm with centralized removal queue and job #24363
Conversation
@Slf4j | ||
class RangeCache { | ||
private final ConcurrentNavigableMap<Position, RangeCacheEntryWrapper> entries; | ||
private final RangeCacheRemovalQueue removalQueue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removal O(1)? Do we need to scan items in the queue to remove an item?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use some basic LRU??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's O(1). there's no need to do any scanning. the eviction happens always in FIFO order.
Actual cached entries might be evicted in other ways (let's say when a ledger is removed), but that doesn't impact the removal queue. When a cached entry has already been evicted, the "empty" wrapper will be in the queue and it will get removed once the queue processing proceeds due to expiration or by sized based eviction. Just to say that there won't be any scanning to remove the entries from the queue when EntryCache instances are cleared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
what happens if some entries are reused repeatedly? Do we move them to the tail of the removal queue? Shouldn't we deprioritize evicting those entries?(e.g large fanout cases, subscriptions with many consumers)
-
Dont we still need to scan entire list when all items timestamp are less than the threshold?(we dont do early stop here)
-
Is this cache per ledger? If yes, Why not a single global cache shared with all ledgers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we need to worry about the size limit of the removal queue from too many empty wrapper entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Dont we still need to scan entire list when all items timestamp are less than the threshold?(we dont do early stop here)
there's no need to scan the entire queue currently. For future work with different thresholds, there could be multiple phases somewhat similar way as garbage collection algorithms do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. Is this cache per ledger? If yes, Why not a single global cache shared with all ledgers?
this PR doesn't change that. I think it's an efficient solution that caches are per ledger for the broker use cases. I don't think that a single global cache would bring any benefits over the current solution.
This PR addresses the problems in the existing implementation about evicting expired entries and keeping the total global cache size under a limit. That has been a problem in having the individual caches, but after this PR, the problem is solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. Do we need to worry about the size limit of the removal queue from too many empty wrapper entries?
The RangeCacheEntryWrapper causes an overhead about 100 bytes per entry currently. With 100,000 entries in the queue, that would be about 10 MB of heap. That shouldn't be a problem.
More details about RangeCacheEntryWrapper and its background:
The wrapper entries are recycled so the wrapper entries don't add GC overhead. These wrappers are already used in the RangeCache to ensure consistency while using StampedLocks to avoid synchronization and blocking. The wrapper solution was added to RangeCache in #22789, with follow ups in #22814, #22818, #23903 and #23955. Without the wrapper there were issues where the entry got evicted and recycled while it was getting used. It's explained in the PR descriptions and there are also test cases to reproduce the problems.
The design of the RangeCache is intentionally "racy" so that we wouldn't need to synchronize and block. StampedLock provides the necessary consistency together with the reference counting. The RangeCache's EntryWrapper is also a foundation for this removal queue work since it makes it possible to have the separate EntryCache instances per ledger and centralize the expiration and size based eviction to the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good questions @heesung-sn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for answering my questions. I think this PR and direction looks good to me with the future improvements.
But I guess this requires more discussions in the Pulsar community, too.
Motivation
This PR fixes fundamental inefficiencies and correctness issues in the current Pulsar broker entry cache eviction algorithm. The current implementation has flawed size-based eviction that doesn't remove the oldest entries and incorrect timestamp-based eviction with high CPU overhead. These fixes ensure that size-based eviction properly removes the oldest entries and timestamp-based eviction works correctly. Additionally, this PR serves as a foundation for future improvements to efficiently handle catch-up reads and Key_Shared subscription scenarios.
Mailing list discussion about this PR: https://lists.apache.org/thread/ddzzc17b0c218ozq9tx0r3rx5sgljfb0
Problems with the Current Broker Entry Cache Implementation
Size-Based Eviction doesn't remove oldest entries: The existing
EntryCacheDefaultEvictionPolicy
uses an algorithm for keeping the cache size under the limit but cannot guarantee removal of the oldest entries from the cache. The algorithm:PercentOfSizeToConsiderForEviction
, default0.5
)Inefficient and Incorrect Timestamp-Based Eviction: The current timestamp eviction has both performance and correctness issues:
managedLedgerCacheEvictionIntervalMs=10
) - 100 times per second!Limited Cache Scope: The original
RangeCache
was designed for tailing reads. Later changes added support for backlogged cursors, but the eviction algorithms weren't updated to handle mixed read patterns effectively.Unnecessary Complexity: Generic type parameters in
RangeCache
add complexity without providing value, as the cache is only used for entry storage.Modifications
1. Centralized Removal Queue (
RangeCacheRemovalQueue
)MpscUnboundedArrayQueue
to maintain entry insertion order2. Simplified Cache Implementation
RangeCache
to reduce complexity3. Foundation for Future Improvements
The existing broker cache has limitations:
cacheEvictionByMarkDeletedPosition=true
)This refactoring prepares the cache system for:
Algorithm Comparison
Before (EntryCacheDefaultEvictionPolicy)
Size Based Eviction
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/EntryCacheDefaultEvictionPolicy.java
Lines 41 to 93 in 82237d3
Timestamp eviction
managedLedgerCacheEvictionIntervalMs=10
)pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerFactoryImpl.java
Lines 306 to 317 in eccc6b6
After (RangeCacheRemovalQueue)
https://github.com/apache/pulsar/blob/b72bc4ff3aa5c9c45d9233d2d000429b3cf0ce1a/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeCacheRemovalQueue.java
Note: There's a single shared removal queue for all ManagedLedgerImpl instances instead of having to do the check in multiple instances.
Verifying this change
This change is already covered by existing tests:
RangeCacheTest
validates the new removal queue functionalityEntryCacheManagerTest
verifies eviction behavior remains correctDocumentation
doc
doc-required
doc-not-needed
doc-complete