[BUG] Different score translations for ann search and exact search #2319

wdongyu · 2024-12-11T13:12:15Z

What is the bug?
After introducing the settings index.knn.advanced.approximate_threshold in #2188, we may encounter such a scenario that two segments exist in one single shard. Suppose Segment_1 includes a graph and segment_2 does not, and they both include the same vector X.

When we conduct a search with a vector Q, we will get two different scores for the same vector X, because score translations for ann search and exact search are sightly different. For example, for cosine metric we have a score( 1 / (2 - cos(Q, X))) in ann search, but get another score ((1 + cos(Q, X)) / 2) in exact search.

How can one reproduce the bug?
Steps to reproduce the behavior:

Create a index with nmslib and cosine distance, set index.knn.advanced.approximate_threshold to -1, so that it never build a graph:

PUT test_nmslib_cosine
{
  "settings": {
    "index.knn": true,
    "index.knn.advanced.approximate_threshold": "-1",
    "index.number_of_shards": 1,
    "index.number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "target_field": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
          "engine": "nmslib",
          "space_type": "cosinesimil",
          "name": "hnsw"
        }
      }
    }
  }
}

Ingest a doc with vector[0.6, 0.8]:

POST test_nmslib_cosine/_doc/1?refresh
{
  "target_field": [0.6, 0.8] 
}

update index.knn.advanced.approximate_threshold to 1, so that it always build a graph:

PUT test_nmslib_cosine/_settings
{
  "index.knn.advanced.approximate_threshold": "1"
}

Ingest another doc with vector[0.6, 0.8]:

POST test_nmslib_cosine/_doc/2?refresh
{
  "target_field": [0.6, 0.8] 
}

Search the data:

POST test_nmslib_cosine/_search
{
  "query": {
    "knn": {
      "target_field": {
        "vector": [1, 2],
        "k": 10
      }
    }
  }
}

Get the result:

"hits": [
      {
        "_index": "test_nmslib_cosine",
        "_id": "1",
        "_score": 0.99193496,    // -> score = (1 + cos([1, 2], [0.6, 0.8])) / 2
        "_source": {
          "target_field": [
            0.6,
            0.8
          ]
        }
      },
      {
        "_index": "test_nmslib_cosine",
        "_id": "2",
        "_score": 0.98412585,  // -> score = 1 / (2 - cos([1, 2], [0.6, 0.8]))
        "_source": {
          "target_field": [
            0.6,
            0.8
          ]
        }
      }
    ]

What is the expected behavior?
Should get a consistent score for the same query and data vector.

What is your host/environment?

OS: Any OS
Version 2.18.0

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

vamshin · 2024-12-11T19:31:13Z

@wdongyu good catch! if we make the score calculation consistent, that should fix the problem.

navneet1v · 2024-12-11T23:48:25Z

@wdongyu thanks for reporting this issue. I think this comes because for exact search we use Lucene based score translation and for native libs it is different. @VijayanB please take a look at this and lets ensure that we use a consistent score calculations.

wdongyu · 2025-01-09T04:21:17Z

@VijayanB Thanks for the fix. I notice that you mention only the nmslib engine is affected with this change. But faiss engine also uses the scoreTranslation function, why is it not affected?

wdongyu · 2025-01-09T15:16:44Z

@VijayanB Thanks for the fix. I notice that you mention only the nmslib engine is affected with this change. But faiss engine also uses the scoreTranslation function, why is it not affected?

My mistake, faiss doesn't support cosine similarity yet. But once it supports, we can also keep the consistent scores for all engines using cosine similarity, right?

VijayanB · 2025-01-15T05:08:44Z

@VijayanB Thanks for the fix. I notice that you mention only the nmslib engine is affected with this change. But faiss engine also uses the scoreTranslation function, why is it not affected?

My mistake, faiss doesn't support cosine similarity yet. But once it supports, we can also keep the consistent scores for all engines using cosine similarity, right?

Yes.

VijayanB · 2025-01-15T05:09:13Z

Will close this issue once i update doc

wdongyu added bug Something isn't working untriaged labels Dec 11, 2024

wdongyu changed the title ~~[BUG] Difference score translations for ann search and exact search~~ [BUG] Different score translations for ann search and exact search Dec 11, 2024

vamshin removed the untriaged label Dec 11, 2024

vamshin added this to Vector Search RoadMap Dec 11, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Dec 11, 2024

vamshin moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Dec 11, 2024

navneet1v assigned VijayanB Dec 11, 2024

navneet1v moved this from Backlog (Hot) to 2.19.0 in Vector Search RoadMap Dec 24, 2024

navneet1v added the v2.19.0 label Dec 24, 2024

VijayanB mentioned this issue Dec 26, 2024

Use one formula to calculate cosine similarity #2357

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Different score translations for ann search and exact search #2319

[BUG] Different score translations for ann search and exact search #2319

wdongyu commented Dec 11, 2024 •

edited

Loading

vamshin commented Dec 11, 2024

navneet1v commented Dec 11, 2024

wdongyu commented Jan 9, 2025

wdongyu commented Jan 9, 2025

VijayanB commented Jan 15, 2025

VijayanB commented Jan 15, 2025

[BUG] Different score translations for ann search and exact search #2319

[BUG] Different score translations for ann search and exact search #2319

Comments

wdongyu commented Dec 11, 2024 • edited Loading

vamshin commented Dec 11, 2024

navneet1v commented Dec 11, 2024

wdongyu commented Jan 9, 2025

wdongyu commented Jan 9, 2025

VijayanB commented Jan 15, 2025

VijayanB commented Jan 15, 2025

wdongyu commented Dec 11, 2024 •

edited

Loading