-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use one formula to calculate cosine similarity #2357
Conversation
f14147b
to
6e7c88e
Compare
6e7c88e
to
7d19cac
Compare
37d132a
to
cc49de3
Compare
Looks like flaky test failure since the same test is successful in Linux and Mac. Create GH issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @VijayanB - can you summarize impact of change in description. For instance, this only impacts nmslib, correct?
a4eb621
to
424765e
Compare
d4cbc14
to
6597eb1
Compare
src/main/java/org/opensearch/knn/plugin/script/KNNScoringSpace.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you be adding the BWC test for this change after this PR?
@navneet1v Yes. Will add it as next PR |
Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Signed-off-by: Vijayan Balasubramanian <[email protected]>
Rebased to resolve CHANGE.MD file conflict |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2357-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 84cfa8e00ac04197cc98a5f32379637730cdb979
# Push it to GitHub
git push --set-upstream origin backport/backport-2357-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> * update test Signed-off-by: Vijayan Balasubramanian <[email protected]> * add version check Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> * update test Signed-off-by: Vijayan Balasubramanian <[email protected]> * add version check Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> * update test Signed-off-by: Vijayan Balasubramanian <[email protected]> * add version check Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> * update test Signed-off-by: Vijayan Balasubramanian <[email protected]> * add version check Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 84cfa8e)
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> * update test Signed-off-by: Vijayan Balasubramanian <[email protected]> * add version check Signed-off-by: Vijayan Balasubramanian <[email protected]> --------- Signed-off-by: Vijayan Balasubramanian <[email protected]>
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 84cfa8e)
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 84cfa8e)
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 84cfa8e)
* Have one score definition for cosinesimilarity Currently we have different score calculation for cosine similarity, for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity that is aligned with OpenSearch score. To keep it consistent, we will be using one defintion which is used by Lucene as standard definition for cosine similarity for all search types. Signed-off-by: Vijayan Balasubramanian <[email protected]> (cherry picked from commit 84cfa8e) Co-authored-by: Vijayan Balasubramanian <[email protected]>
Description
Currently we have different score calculation for cosine similarity,
for ex: script score, approximate search, exact search has diffent formula to convert distance to cosine similarity in order to align with OpenSearch score definition. To keep it consistent, we will be using one definition which is used
by Lucene as standard definition for cosine similarity for all search types.
What changed?
Going forward, nmslib engine with cosine similarity as space type will use new formula to define similarity ( same as how exact search and lucene calculates cosine similarity between query and input vector).
During script scoring ( painless and knn script score ), cosinesimil method will also new formula to calculate similarity between query and input vector.
Other engines, are not affected with this change.
Related Issues
#2319
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.