[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed #2115

navneet1v · 2024-09-18T08:11:06Z

Description

Currently Lucene Engine KNN queries get executed during the re-write phase and not in the Weight class. On recent deep-dive we observed that rewrite function of a query can be called multiple times in the overall search flow.

Please check this code trace on rewrite running before start of fetch phase.

Transport action registering fetchphase: https://github.com/opensearch-project/OpenSearch/blob/f67ed1796749376401c5cc617eff[…]n/java/org/opensearch/action/search/SearchTransportService.java
Search Context is built: https://github.com/opensearch-project/OpenSearch/blob/8148e4d295397d4f5b50b85f4dc3[…]6/server/src/main/java/org/opensearch/search/SearchService.java
In building the searchcontext we parse the source again: https://github.com/opensearch-project/OpenSearch/blob/8148e4d295397d4f5b50b85f4dc3[…]6/server/src/main/java/org/opensearch/search/SearchService.java (this is already done in query phase)
During the create context we do the preProcess: https://github.com/opensearch-project/OpenSearch/blob/8148e4d295397d4f5b50b85f4dc3[…]6/server/src/main/java/org/opensearch/search/SearchService.java
In preprocess we do context preprocess: https://github.com/opensearch-project/OpenSearch/blob/4035db48c6963e46909f28ba8552[…]erver/src/main/java/org/opensearch/search/query/QueryPhase.java
And then we do rewrite again: https://github.com/opensearch-project/OpenSearch/blob/4f97fd3d5588f9be52bee37d2c51[…]r/src/main/java/org/opensearch/search/DefaultSearchContext.java

The same was observed in the flame graphs, where when we have more than 1 shard, during fetch phase the rewrite on the query is called again. This leads to running of the Lucene engine k-NN query more than 1 and adds latency.

Flame Graph

Number of shards: 2
KNN engine: Lucene
Dataset: 1M 128D sift.
Tool used: OSB
Docker image: opensearchstaging/opensearch:2.17.0.10284
Heap: 16GB
RAM: 64GB
Cores: 16
Search JFR: search.jfr.zip

Why we need Query and its rewrite in fetch phase

A quick scan of the opensearch core code in fetch phase I found use cases that might require to run the query rewrite again and then use it during fetch phase. Below are the references where query which was rewritten in the DefaultSearchContext during FetchPhase is added to FetchPhaseSearchContext and was used.
Explain Sub phase: This is used to provide the explanation why a particular result is in the query.
PercolateQuery Highlighting: No idea what this query type is but it does use some visitor pattern of the Query(Lucene interface) to do something.
Inner Hits: This is used to get the child hits when there is a parent child relationship between fields. Not sure about this use case as it is actually doing some more funky logic on query. This needs more deep-dive

Possible Solution

Solution 1

One solution we can explore is by wrapping all the Lucene queries in another QueryClause lets say LuceneEngineKNNQuery and have a class member in this class which actual Lucene query. Now when createWeight is called we can first rewrite the query and then call the scorer on top of it. This will ensure that Lucene k-NN query is executed only once.

Sample Code:

public class LuceneEngineKNNQuery extends Query {

    Query luceneQuery; // this can be KnnFloatVectorQuery, KnnByteVectorQuery, DiversifyingChildrenByteKnnVectorQuery etc.
    
    @Override
    public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {
        // rewrite the lucene query
        Query docAndScoreQuery = searcher.rewrite(luceneQuery);
        // now return the weight class
        return docAndScoreQuery.createWeight(searcher, scoreMode, boost);
    }

    
    @Override
    public String toString(String field) {
        return "";
    }

    
    @Override
    public void visit(QueryVisitor visitor) {

    }

    
    @Override
    public boolean equals(Object obj) {
        return false;
    }

    
    @Override
    public int hashCode() {
        return 0;
    }
}

Solution 2

Another solution we can implement by caching the SearchContext at a shard level, and when fetch phase is executed we use the same SearchContext so that we don't need to rewrite the queries.
Another approach can be we can defer the rewrite and make it lazy so that only Fetch Pre-Processors that needs re-write should do the rewrite, and once it is done by a Fetch Processor then none of the other will need to run the re-write again as the query has changed now.

Pros and Cons

Both solution has its own pros and cons. Like

Solution 1, solves the problem without making drastic changes in core, but may create problem if Lucene query execution changes from rewrite.
Solution 2, Implementing a shard level SearchContext Cache may be tricky as it can increase the heap and what could be a blast radius is unknown. But this solution will ensure that rewrite is happening once for the query in the whole search request at Opensearch level.

The text was updated successfully, but these errors were encountered:

jmazanec15 · 2024-09-18T16:14:08Z

Im in favor of solution (1) over solution (2). I cannot think of a major advantage of doing rewrite vs createWeight, unless there is some kind of benefit around caching - but I cannot think of any

navneet1v · 2024-09-18T17:23:07Z

I am also in favor of 1, but I was just thinking if we can fix it from Core side will that help us or not. Hence I added solution 2. Since the change in core is not even simple and might impact latency if not implemented correctly. So I think we should go with solution 1. But would like to hear more from other maintainers.

@luyuncheng , @vamshin , @heemin32

heemin32 · 2024-09-18T22:27:24Z

How about caching the faiss search result for a short time. Do we know if the query is from the same request or not using something like query UUID? Blast radius could be smaller than caching SearchContext.

navneet1v · 2024-09-19T06:04:08Z

@heemin32 this is not for faiss engine, this is for Lucene engine. Plus would you elaborate how caching will work in case of Lucene?

heemin32 · 2024-09-19T14:01:22Z

@heemin32 this is not for faiss engine, this is for Lucene engine. Plus would you elaborate how caching will work in case of Lucene?

You are saying for faiss, the query will happen only one time?
I thought it could be same for faiss. Maybe my understanding is only limited to innerHit. For innerHit, the query get executed 1 + n(number of returned item) which is also true for faiss.

Hmm. But for n, because it is filtered to its single parent, I guess exact search will get hit and caching might not help here.

navneet1v · 2024-09-19T16:16:40Z

@heemin32 I didn't mention faiss anywhere and this double query execution for lucene happens because lucene query is executed during rewrite and if you see the links added in the description rewrite for queries happen for both query and fetch phase(for more than 1 shard). Hence extra latency.

This issue doesn't talk about extra latency during inner hits.

luyuncheng · 2024-09-24T05:26:14Z

I am also in favor of 1! it looks good to me. @navneet1v

I thought it could be same for faiss.

@heemin32 @navneet1v i think it would not happens with native knn query. rewrite comes from AbstractKnnVectorQuery in lucene engine and it would do rewrite in pre-process which is for reducing shard query. but we will skip this in KNNQuery

luyuncheng · 2024-09-24T05:34:50Z

Im in favor of solution (1) over solution (2). I cannot think of a major advantage of doing rewrite vs createWeight, unless there is some kind of benefit around caching - but I cannot think of any

@jmazanec15 i think the major advantage is when there is no hits in knnQuery like specific min_score it can skip the shard, but i prefer to skip rewrite for majority scenarios,

so the solution 1 is better for me. @navneet1v

navneet1v · 2024-09-24T05:43:32Z

@heemin32 @navneet1v i think it would not happens with native knn query. rewrite comes from AbstractKnnVectorQuery in lucene engine and it would do rewrite in pre-process which is for reducing shard query. but we will skip this in KNNQuery

yes thats correct. The issue which we are talking in this gh issue doesn't happen for Native engines.

navneet1v · 2024-09-24T05:44:56Z

@luyuncheng thanks for putting up the thoughts. @junqiu-lei will be working on the fix. I think we should be able fix this before 2.18 release.

heemin32 · 2024-11-07T19:27:03Z

With option 1, we could use inheritance instead of delegation, allowing us to inherit all other methods unchanged.

public class LuceneEngineKNNQuery extends KnnFloatVectorQuery {
    @Override
    public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {
        // rewrite the lucene query
        Query docAndScoreQuery = searcher.rewrite(luceneQuery);
        // now return the weight class
        return docAndScoreQuery.createWeight(searcher, scoreMode, boost);
    }
}

public class LuceneEngineKNNQuery extends KnnByteVectorQuery {...}
public class LuceneEngineKNNQuery extends DiversifyingChildrenFloatKnnVectorQuery {...}
public class LuceneEngineKNNQuery extends DiversifyingChildrenByteKnnVectorQuery {...}

navneet1v · 2024-11-07T20:32:49Z

@heemin32 I would prefer delegation/composition here over inheritance, so that we can avoid creating new queries in Opensearch whenever Lucene adds a new query.

shatejas · 2024-11-17T06:08:54Z

@kotwanikunal One of the approaches I was thinking with this was is to unify NativeEngineKnnVectorQuery and the LuceneEngineKNNQuery mentioned in Solution.

I see a few of benefits if we are able to pull it off

Code deduplication and better maintainability.
Rescoring support for Lucene engine without any additional costs.

One of the approaches is to change KNNQuery to a generic Query. This will allow us to hold both KNNQuery and LuceneQueries

There are some challenges though

Lucene query uses rewrite while KNNQuery uses searchLeaf
Lucene Queries rewrite method manages parallel execution

Its worth looking into if there is a solution around these challenges

navneet1v · 2024-12-03T23:55:09Z

@kotwanikunal I saw the PR #2305 and I am excited to see benchmarking results with that change.

On the note of unification, can do this as a 2 step process. Unification is always good and helps reduce a lot of code branches. But this should not spill over and delay this fix for Lucene engine.

@shatejas on this

Rescoring support for Lucene engine without any additional costs.

I am not sure if we would be able to add the rescoring support just like this in the Lucene engine. Reason is: Lucene currently uses the FlatVectors as the vectors for HNSW graph. So when we try to access the flatVectors via Codec it will give the same quantized vectors and not full precision vectors. I see that in BQ support for Lucene they are trying the access of floatVectorValues via codec apache/lucene#13651. But since it is still in PR so cannot say when it will be available. Please correct me if there is something I am missing.

kotwanikunal · 2024-12-04T19:21:15Z

@kotwanikunal I saw the PR #2305 and I am excited to see benchmarking results with that change.

On the note of unification, can do this as a 2 step process. Unification is always good and helps reduce a lot of code branches. But this should not spill over and delay this fix for Lucene engine.

@shatejas on this

Rescoring support for Lucene engine without any additional costs.

I am not sure if we would be able to add the rescoring support just like this in the Lucene engine. Reason is: Lucene currently uses the FlatVectors as the vectors for HNSW graph. So when we try to access the flatVectors via Codec it will give the same quantized vectors and not full precision vectors. I see that in BQ support for Lucene they are trying the access of floatVectorValues via codec apache/lucene#13651. But since it is still in PR so cannot say when it will be available. Please correct me if there is something I am missing.

That sounds like a good plan. I prioritized getting through the benchmarks and new flame graphs. Added them here: #2305 (comment)

kotwanikunal · 2025-01-07T19:19:50Z

The change was merged into 2.x on 12/11/2024: 8daedac

On the benchmarking dashboard, we can see that the latency for 2.19 has dropped in line with the merge.

Dashboards: https://opensearch.org/benchmarks/ -> Vectorsearch-lucene-Cohere-1m-768D (Start date: Dec 3, 2024 @ 19:19:19.215)

navneet1v added untriaged enhancement labels Sep 18, 2024

navneet1v added this to Vector Search RoadMap Sep 18, 2024

navneet1v moved this to Backlog (Hot) in Vector Search RoadMap Sep 18, 2024

jmazanec15 removed the untriaged label Sep 18, 2024

vamshin assigned junqiu-lei Sep 20, 2024

navneet1v added the v2.18.0 label Sep 24, 2024

vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Sep 27, 2024

opensearch-infra bot added this to OpenSearch Roadmap Sep 27, 2024

github-project-automation bot moved this to New in OpenSearch Roadmap Sep 27, 2024

vamshin moved this from Backlog (Hot) to 2.18.0 in Vector Search RoadMap Sep 27, 2024

vamshin added v2.19.0 and removed v2.18.0 labels Oct 18, 2024

vamshin moved this from 2.18.0 to 2.19.0 in Vector Search RoadMap Oct 18, 2024

vamshin assigned kotwanikunal and unassigned junqiu-lei Oct 31, 2024

heemin32 mentioned this issue Nov 5, 2024

[RFC] Multiple inner hits for nested field #2249

Closed

4 tasks

kotwanikunal mentioned this issue Dec 3, 2024

Adds in lazy execution for Lucene kNN queries #2305

Merged

5 tasks

kotwanikunal mentioned this issue Dec 7, 2024

[FEATURE] Unify LuceneEngineKnnVectorQuery and NativeEngineKnnVectorQuery within KNNQuery #2316

Open

navneet1v closed this as completed in #2305 Dec 11, 2024

github-project-automation bot moved this from 2.19.0 to ✅ Done in Vector Search RoadMap Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed #2115

[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed #2115

navneet1v commented Sep 18, 2024 •

edited

Loading

jmazanec15 commented Sep 18, 2024

navneet1v commented Sep 18, 2024

heemin32 commented Sep 18, 2024

navneet1v commented Sep 19, 2024

heemin32 commented Sep 19, 2024

navneet1v commented Sep 19, 2024 •

edited

Loading

luyuncheng commented Sep 24, 2024 •

edited

Loading

luyuncheng commented Sep 24, 2024

navneet1v commented Sep 24, 2024

navneet1v commented Sep 24, 2024

heemin32 commented Nov 7, 2024

navneet1v commented Nov 7, 2024

shatejas commented Nov 17, 2024

navneet1v commented Dec 3, 2024 •

edited

Loading

kotwanikunal commented Dec 4, 2024

kotwanikunal commented Jan 7, 2025 •

edited

Loading

[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed #2115

[FEATURE] Improving Lucene Engine Query Performance by reducing number of times a single Lucene k-NN query gets executed #2115

Comments

navneet1v commented Sep 18, 2024 • edited Loading

Description

Why we need Query and its rewrite in fetch phase

Possible Solution

Solution 1

Sample Code:

Solution 2

Pros and Cons

jmazanec15 commented Sep 18, 2024

navneet1v commented Sep 18, 2024

heemin32 commented Sep 18, 2024

navneet1v commented Sep 19, 2024

heemin32 commented Sep 19, 2024

navneet1v commented Sep 19, 2024 • edited Loading

luyuncheng commented Sep 24, 2024 • edited Loading

luyuncheng commented Sep 24, 2024

navneet1v commented Sep 24, 2024

navneet1v commented Sep 24, 2024

heemin32 commented Nov 7, 2024

navneet1v commented Nov 7, 2024

shatejas commented Nov 17, 2024

navneet1v commented Dec 3, 2024 • edited Loading

kotwanikunal commented Dec 4, 2024

kotwanikunal commented Jan 7, 2025 • edited Loading

navneet1v commented Sep 18, 2024 •

edited

Loading

navneet1v commented Sep 19, 2024 •

edited

Loading

luyuncheng commented Sep 24, 2024 •

edited

Loading

navneet1v commented Dec 3, 2024 •

edited

Loading

kotwanikunal commented Jan 7, 2025 •

edited

Loading