Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use one formula to calculate cosine similarity #2357

Merged
merged 3 commits into from
Jan 6, 2025

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Dec 26, 2024

Description

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity in order to align with OpenSearch score definition. To keep it consistent, we will be using one definition which is used
by Lucene as standard definition for cosine similarity for all search types.

What changed?

Going forward, nmslib engine with cosine similarity as space type will use new formula to define similarity ( same as how exact search and lucene calculates cosine similarity between query and input vector).

During script scoring ( painless and knn script score ), cosinesimil method will also new formula to calculate similarity between query and input vector.

Other engines, are not affected with this change.

Related Issues

#2319

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@VijayanB VijayanB force-pushed the update-score branch 3 times, most recently from f14147b to 6e7c88e Compare December 27, 2024 21:55
@VijayanB VijayanB changed the title Update cosine score translation for nmslib Use one formula to calculate cosine similarity Dec 27, 2024
@VijayanB VijayanB marked this pull request as ready for review December 27, 2024 22:28
@VijayanB VijayanB force-pushed the update-score branch 2 times, most recently from 37d132a to cc49de3 Compare December 27, 2024 22:41
@VijayanB
Copy link
Member Author

Looks like flaky test failure since the same test is successful in Linux and Mac. Create GH issue
#2358

Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @VijayanB - can you summarize impact of change in description. For instance, this only impacts nmslib, correct?

CHANGELOG.md Outdated Show resolved Hide resolved
src/main/java/org/opensearch/knn/index/SpaceType.java Outdated Show resolved Hide resolved
@VijayanB VijayanB force-pushed the update-score branch 2 times, most recently from a4eb621 to 424765e Compare December 30, 2024 19:51
@VijayanB VijayanB requested a review from jmazanec15 December 30, 2024 19:51
@VijayanB VijayanB force-pushed the update-score branch 2 times, most recently from d4cbc14 to 6597eb1 Compare December 30, 2024 20:00
@VijayanB VijayanB self-assigned this Dec 30, 2024
navneet1v
navneet1v previously approved these changes Jan 3, 2025
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you be adding the BWC test for this change after this PR?

@VijayanB
Copy link
Member Author

VijayanB commented Jan 3, 2025

Will you be adding the BWC test for this change after this PR?

@navneet1v Yes. Will add it as next PR

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB
Copy link
Member Author

VijayanB commented Jan 3, 2025

Rebased to resolve CHANGE.MD file conflict

@VijayanB VijayanB requested a review from navneet1v January 3, 2025 23:51
@VijayanB VijayanB merged commit 84cfa8e into opensearch-project:main Jan 6, 2025
32 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2357-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 84cfa8e00ac04197cc98a5f32379637730cdb979
# Push it to GitHub
git push --set-upstream origin backport/backport-2357-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2357-to-2.x.

Gankris96 pushed a commit to Gankris96/k-NN that referenced this pull request Jan 8, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* add version check

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Jan 9, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* add version check

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit to VijayanB/k-NN-2 that referenced this pull request Jan 9, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* add version check

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 9, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* add version check

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)
owenhalpert pushed a commit to owenhalpert/k-NN that referenced this pull request Jan 9, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* update test

Signed-off-by: Vijayan Balasubramanian <[email protected]>

* add version check

Signed-off-by: Vijayan Balasubramanian <[email protected]>

---------

Signed-off-by: Vijayan Balasubramanian <[email protected]>
VijayanB added a commit that referenced this pull request Jan 12, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)
VijayanB added a commit that referenced this pull request Jan 15, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)
VijayanB added a commit that referenced this pull request Jan 15, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)
VijayanB added a commit that referenced this pull request Jan 15, 2025
* Have one score definition for cosinesimilarity

Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula
to convert distance to cosine similarity that is aligned with OpenSearch
score. To keep it consistent, we will be using one defintion which is used
by Lucene as standard definition for cosine similarity for all search types.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
(cherry picked from commit 84cfa8e)

Co-authored-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB deleted the update-score branch January 15, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants