-
Notifications
You must be signed in to change notification settings - Fork 447
[SwiftLexicalLookup] Unqualified lookup caching #3068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without diving too deeply into the details: I am a little concerned about the cache eviction behavior and the fact that you need to manually call evictEntriesWithoutHit
(which incidentally doesn’t seem to be called in this PR or swiftlang/swift#81209) and I think it’s easy for clients to forget to call it. Does this more complex cache eviction policy provide significant benefits over a simple LRU cache that keeps, say 100, cache entries? We could share the LRUCache
type that we currently have in SwiftCompilerPluginMessageHandling
for that. Curious to hear your opinion.
/// memory accesses could take longer, slowing down the eviction process. That's why the `drop` value | ||
/// could be fine-tuned to maximize the performance given file size, | ||
/// number of lookups, and amount of available memory. | ||
public init(drop: Int = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not a fan of the drop
naming here. I don’t have a better suggestion yet, maybe I’ll come up with one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it is a bit ambiguous. What about skip
?
Hi Alex, thank you for the suggestions and sorry for the late reply. I got quite busy with school. Thank you for pointing out
The current implementation assumes subsequent lookups happen in close proximity to the previous lookup. Like e.g. in the compiler in a single top-bottom scan (best case). The algorithm follows the intuition that for any (close) subsequent lookup, we shouldn’t recompute more than one scope. In top-bottom scan and maintaining one path to the root, we always have a guaranteed cache hit in the first common ancestor. I think a sufficiently big LRU cache would have a similar behavior, but it would require more memory than this approach and not provide additional speedup. I’ve also noticed that growing the cache too big leads to diminishing returns. I suppose it could happen because less of the data structure can remain cached in memory. I attach below a sketch I used when pitching the idea to @DougGregor that visualizes an optimal top-bottom scan. In each step, blue represents contents of the cache, red represents evicted entries and green arrows point at the lookup position. I think SwiftLexicalLookup could still benefit from an LRU cache though. The current implementation lacks an ability to arbitrarily lookup previously evaluated names without reevaluating a great part of the syntax tree below. What if we still used the optimal and small cache from the current implementation for subsequent lookups and maintain a large LRU cache for symbols/leaves that would fill up alongside it? This way we would have the best out of two worlds without blowing up the size of LRU with intermediate scope results. What do you think about this idea? |
Would it be possible to use an LRU cache and provide an eviction method that can be called to clean up the cache as we know that some parts of it are no longer relevant (what you described in the sketch above). That way we would get reasonable out-of-the-box behavior and don’t have an ever-growing cache but also have the ability to keep the cache size low in cases where the client (here the compiler) cares about it and knows the access patterns. |
Ah yes, that’s a very good idea to have an upper bound for the size of the cache. I haven’t thought about it. I’ll try to look into how to extend |
We should hoist it up. We could put it into a new module or just stick it in the |
…ft-syntax module with package level access.
@swift-ci Please test |
@swift-ci Please test Windows Platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my review comments. I just had a chance to look at the PR again and left a few comments inline.
Sources/SwiftSyntax/LRUCache.swift
Outdated
@@ -33,12 +32,14 @@ public class LRUCache<Key: Hashable, Value> { | |||
private unowned var tail: _Node? | |||
|
|||
public let capacity: Int | |||
public private(set) var keysInCache: Set<Key> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Alex suggests, I'm not sure this addition is needed.
But even if we want this API, I don't think this Set
storage is necessary. This is essentially table.keys
except Dictionary.Keys
is not a Set
. I think something like this should be fine.
package var keys: some Collection<Key> {
table.keys
}
The caller can create a Set
from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is a much better idea that avoids redundancy. Thanks!
…wing cache. Refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment about the automatic cache eviction, otherwise LGTM.
if hits.count > capacity * 2 { | ||
hits.removeAll() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t seem right to me. Once hits.count > capacity * 2
, we pretend that there were no hits. Shouldn’t we be clearing ancestorResultsCache
and sequentialResultsCache
in that case?
This PR introduces optional caching support to
SwiftLexicalLookup
. In order to use it, clients can pass an instance ofLookupCache
as a parameter to the lookup function.LookupCache
keeps track of cache member hits. In order to prevent the cache from taking too much memory, clients can call theLookupCache.evictEntriesWithoutHit
function to remove members without a hit and reset the hit property for the remaining members. Calling this function every time after lookup effectively maintains one path from a leaf to the root of the scope tree in cache.Clients can also optionally set the
drop
value: