-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Use IndexInput to load the graph files for Native Index #1951
Comments
I really like this idea. Im wondering how we can minimize overhead of copying between native and java layers though. |
Yes that is an interesting question. I think its one time operation, during the graph loads. But yes we can think more once we have a working solution. |
I will work on few plausible solutions regarding this with problem definition where we can start discussions real quick. |
Initial RFC for this feature has been added here: #2033 |
Closing out this GH issue as the feature is now merged in 2.18 and is getting released with 2.18. |
Description
Currently K-NN plugin for native engines(Faiss and Nmslib) creates a separate graph file(in codec) to build and store the k-NN index at segment level. This file is tracked by Lucene for a segment but while reading the file k-NN plugin relies on FSDirectory to get the full path of the k-NN index at segment level and then use Native libs api to load the index in memory.
The above behavior causes few problems:
Solution
The solution I am proposing here is rather than relying on path of the file, k-NN plugin should use IndexInput to read the file. This new reading behavior also needs to be integrated with Faiss/Nmslib lib. In Faiss, I see they provide an interface IOReader which can be used to load the contents of the file. If k-NN plugin implements the interface and then underneath if it uses IndexInput to read the file this will avoid the problems mentioned above.
Some deep-dive I did suggest that IndexInput provides a way to read byte and Faiss just asks for
n
bytes anytime it wants to read anything.Ref: https://github.com/facebookresearch/faiss/blob/df0dea6c6d8951056763dc03528b3973c6ba26e2/faiss/impl/index_read.cpp#L531
Ref: https://github.com/facebookresearch/faiss/blob/c0052c15336a57f7068a7d098d5ce5b6234a2d70/faiss/impl/io_macros.h#L17-L28
Ref Lucene: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/DataInput.java#L58
I have not done any deep-dive on Nmslib.
Indexing
I also see that on indexing while writing the native index file we use the FSDirectory, if we do similar changes for writing the native index file, we can also remove the dependency of FSDirectory from write path too. Ref:
k-NN/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java
Line 121 in ca5e483
The text was updated successfully, but these errors were encountered: