-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cosine similarity support for faiss engine #2376
base: main
Are you sure you want to change the base?
Conversation
d6f16a1
to
6671e64
Compare
Adding additional unit and integration test for radial search. Will mark it as ready once i add those tests |
eee45d5
to
4658bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @VijayanB - completed a first pass review
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/AbstractKNNMethod.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/engine/KNNLibraryIndexingContext.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/ModelFieldMapper.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/mapper/VectorTransformer.java
Outdated
Show resolved
Hide resolved
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine) { | ||
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine, SpaceType spaceType) { | ||
|
||
// Cosine similarity is supported as Inner product by FAISS by normalizing input vector, hence, we have to normalize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this check out of this class? This class is already very crowded and I want to avoid adding more checks around engines. Instead, could we investigate either adding it to https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/engine/KNNLibrarySearchContext.java and/or adding a method in KNNVectorFieldType (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldType.java#L85) that says "should normalize query" or, better yet, transformQuery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to KNNVectorFieldType
@@ -700,7 +703,8 @@ protected void parseCreateField(ParseContext context, int dimension, VectorDataT | |||
} | |||
final float[] array = floatsArrayOptional.get(); | |||
getVectorValidator().validateVector(array); | |||
context.doc().addAll(getFieldsForFloatVector(array)); | |||
final float[] transformedArray = getVectorTransformer().transform(array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be called before the per-dimension processor too? What should contract around these 2 be? Im wondering if we even need the per-dimension or if we can wrap that in this new full vector transform.
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion and perform search, normalize the input vector and add it to faiss index with type as inner product. Since we will be storing normalized vector in segments, to get actual vectors, source can be used. By saving as normalized vector, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). We also support radial search for cosine similarity. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
ff72fe5
to
5063334
Compare
5063334
to
8764557
Compare
Signed-off-by: Vijayan Balasubramanian <[email protected]>
8764557
to
7829347
Compare
Description
FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion, normalize the input vector, and add it to faiss index with type as inner product, and, before search, normalize query vector if space type is cosine and engine is faiss.
Since we will be storing normalized vector in segments, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). To avoid this additional latency, customers can normalize their data set and create inner product.
This also adds support to radial search, for both max distance and min score.
Related Issues
#2242
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.