Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cosine similarity support for faiss engine #2376

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Jan 9, 2025

Description

FAISS engine doesn't support cosine similarity natively. However we can use inner product to achieve the same, because, when vectors are normalized then inner product will be same as cosine similarity. Hence, before ingestion, normalize the input vector, and add it to faiss index with type as inner product, and, before search, normalize query vector if space type is cosine and engine is faiss.

Since we will be storing normalized vector in segments, we don't have to normalize whenever segments are merged. This will keep force merge time and search at competitive, provided we will face additional latency during indexing (one time where we normalize). To avoid this additional latency, customers can normalize their data set and create inner product.

This also adds support to radial search, for both max distance and min score.

Related Issues

#2242

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB VijayanB force-pushed the faiss-cosine branch 3 times, most recently from d6f16a1 to 6671e64 Compare January 9, 2025 07:43
@VijayanB
Copy link
Member Author

VijayanB commented Jan 9, 2025

Adding additional unit and integration test for radial search. Will mark it as ready once i add those tests

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @VijayanB - completed a first pass review

src/main/java/org/opensearch/knn/index/SpaceType.java Outdated Show resolved Hide resolved
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine) {
private float[] getVectorForCreatingQueryRequest(VectorDataType vectorDataType, KNNEngine knnEngine, SpaceType spaceType) {

// Cosine similarity is supported as Inner product by FAISS by normalizing input vector, hence, we have to normalize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this check out of this class? This class is already very crowded and I want to avoid adding more checks around engines. Instead, could we investigate either adding it to https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/engine/KNNLibrarySearchContext.java and/or adding a method in KNNVectorFieldType (https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldType.java#L85) that says "should normalize query" or, better yet, transformQuery?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to KNNVectorFieldType

@@ -700,7 +703,8 @@ protected void parseCreateField(ParseContext context, int dimension, VectorDataT
}
final float[] array = floatsArrayOptional.get();
getVectorValidator().validateVector(array);
context.doc().addAll(getFieldsForFloatVector(array));
final float[] transformedArray = getVectorTransformer().transform(array);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be called before the per-dimension processor too? What should contract around these 2 be? Im wondering if we even need the per-dimension or if we can wrap that in this new full vector transform.

FAISS engine doesn't support cosine similarity natively.
However we can use inner product to achieve the same, because,
when vectors are normalized then inner product will be same
as cosine similarity. Hence, before ingestion and perform search,
normalize the input vector and add it to faiss index with type
as inner product.

Since we will be storing normalized vector in segments, to get
actual vectors, source can be used. By saving as normalized vector,
we don't have to normalize whenever segments are merged. This will
keep force merge time and search at competitive, provided we will
face additional latency during indexing (one time where we normalize).

We also support radial search for cosine similarity.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB force-pushed the faiss-cosine branch 5 times, most recently from ff72fe5 to 5063334 Compare January 14, 2025 22:31
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants