-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Speed up (filtered) KNN queries for flat vector fields #130251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
For dense vector fields using the `flat` index, we already know a brute-force search will be used—so there’s no need to go through the codec’s approximate KNN logic. This change skips that step and builds the brute-force query directly, making things faster and simpler. I tested this on a setup with **10 million random vectors**, each with **1596 dimensions** and **17,500 partitions**, using the `random_vector` track. The results: ### Performance Comparison | Metric | Before | After | Change | | ----------------- | --------- | ---------- | --------- | | **Throughput** | 221 ops/s | 2762 ops/s | 🟢 +1149% | | **Latency (p50)** | 29.2 ms | 1.6 ms | 🔻 -94.4% | | **Latency (p99)** | 81.6 ms | 3.5 ms | 🔻 -95.7% | Filtered KNN queries on flat vectors are now over 10x faster on my laptop!
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @jimczi, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am loving these numbers. Thank you for digging into this!
Query knnQuery; | ||
if (indexOptions != null && indexOptions.isFlat()) { | ||
knnQuery = filter == null | ||
? createExactKnnBitQuery(queryVector) | ||
: new BooleanQuery.Builder().add(createExactKnnBitQuery(queryVector), BooleanClause.Occur.SHOULD) | ||
.add(filter, BooleanClause.Occur.FILTER) | ||
.build(); | ||
if (parentFilter != null) { | ||
knnQuery = new ToParentBlockJoinQuery(knnQuery, parentFilter, ScoreMode.Max); | ||
} | ||
} else { | ||
knnQuery = parentFilter != null | ||
? new ESDiversifyingChildrenByteKnnVectorQuery(name(), queryVector, filter, k, numCands, parentFilter, searchStrategy) | ||
: new ESKnnByteVectorQuery(name(), queryVector, k, numCands, filter, searchStrategy); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic makes me think that indexOptions
should satisfy some interfaces for creating queries....but I think that refactor can happen later.
// Retrieve top rescoreK documents from the inner query | ||
var topDocs = searcher.search(innerQuery, rescoreK); | ||
vectorOperations = topDocs.totalHits.value(); | ||
|
||
// Retrieve top k documents from the top rescoreK query | ||
var topDocsQuery = new KnnScoreDocQuery(topDocs.scoreDocs, searcher.getIndexReader()); | ||
var valueSource = new VectorSimilarityFloatValueSource(fieldName, floatTarget, vectorSimilarityFunction); | ||
var rescoreQuery = new FunctionScoreQuery(topDocsQuery, valueSource); | ||
var rescoreTopDocs = searcher.search(rescoreQuery.rewrite(searcher), k); | ||
return new KnnScoreDocQuery(rescoreTopDocs.scoreDocs, searcher.getIndexReader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this because the exact queries don't actually do their own k
limitations. Instead we must apply them with the searcher, right?
Just wanting to clarify, this looks very similar to what it was before other than forcing rescoreK in searcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, two steps reduction since the first query is not a top k query
if (parentFilter != null) { | ||
knnQuery = new ToParentBlockJoinQuery(knnQuery, parentFilter, ScoreMode.Max); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I gotta think about this one...ESDiversifyingChildrenByteKnnVectorQuery
actually returns the children docs so that nested queries work in top-level knn and when under a nested vector query.
If no tests fail here, I am thinking we don't actually have coverage for top-level & query level nested knn over flat indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am writing up some yaml tests now to add to main to cover flat & nested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right good catch. I pushed a fix to remain at the nested level but I realise now that rescoring in this mode is different. We'll rescore non-diversified children so we need to apply collapsing or similar. I'll think more about it while you're adding more tests, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dense vector fields using the
flat
index, we already know a brute-force search will be used, so there’s no need to go through the codec’s approximate KNN logic. This change skips that step and builds the brute-force query directly, making things faster and simpler.I tested this on a setup with 10 million random vectors, each with 1596 dimensions and 17,500 partitions, using the
random_vector
track. The results:Performance Comparison
Filtered KNN queries on flat vectors are now over 10x faster on my laptop!