Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByField Rerank Improvements #969

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

brianf-aws
Copy link
Contributor

@brianf-aws brianf-aws commented Oct 30, 2024

Description

Fixes two logging issues

  1. (for logging Id instead of docid) Uses id instead of DocId for tracking. Id is generated by OpenSearch but docId can be humanly generated/omitted making it a bad metric to report on.
  2. (for explicitly calling out the type) Now instead of stating why a field does not work A explicit type is provided so as to inform the user better why it can't be used. This is accomplished via getClass().getSimpleName()

Adds Numerical String Parsing

  1. Now the processor can distinguish strings like "2.3" as a numerical value.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuye-aws
Copy link
Member

yuye-aws commented Nov 4, 2024

2. (for explicitly calling out the type) The previous logging used docId to be able to tell the user where in the search hit the error occured. The problem is that docId is not guaranteed in every document however _id is.Instead this commit makes sure that it uses getId() instead of docID(). Previously if you had indexed your document and then used docId it would return -1 when it had no direct document field.

Thanks for creating this PR. Can you provide a few examples so that I can better understand the nuance between getId() and docID()?

@martin-gaievski
Copy link
Member

can you please check that this will work when the _source is disabled, just want to make sure docId is not affected but that setting

@martin-gaievski
Copy link
Member

@brianf-aws is this still work in progress?

@brianf-aws
Copy link
Contributor Author

Hey @martin-gaievski I paused on this mainly because we mentioned this was not urgent. I am not sure what time I should prioritize this, what are your thoughts?

@martin-gaievski
Copy link
Member

I don't think it's high priority, how about we close the PR for now, and then reopen once you have bandwidth for it? Let me know if you have objects/concerns

@brianf-aws
Copy link
Contributor Author

Sounds good, thanks for understanding

@brianf-aws brianf-aws closed this Nov 22, 2024
@brianf-aws
Copy link
Contributor Author

Hey everyone I want to reopen this PR and work on it, I should have more bandwidth with the holidays coming up.

@brianf-aws brianf-aws reopened this Dec 13, 2024
@brianf-aws brianf-aws requested a review from minalsha as a code owner December 13, 2024 18:01
@martin-gaievski
Copy link
Member

Good to see this reopened @brianf-aws, please let us know once it's ready for review

@yuye-aws
Copy link
Member

@brianf-aws Is there any update? We can help you if you came across some challenges.

@brianf-aws brianf-aws changed the title fix: better exception message ByFieldRerank Improvements Dec 27, 2024
@brianf-aws brianf-aws changed the title ByFieldRerank Improvements ByField Rerank Improvements Dec 27, 2024
@brianf-aws
Copy link
Contributor Author

brianf-aws commented Dec 27, 2024

Apologies for the delay I introduced a new commit that addresses previous review comments and adds String parsing. Thank you for your patience :)

@yuye-aws, @martin-gaievski do you mind taking a look?

@brianf-aws brianf-aws requested a review from yuye-aws December 27, 2024 22:11
@yuye-aws
Copy link
Member

yuye-aws commented Jan 2, 2025

All my comments have been resolved. Rerunning the failed CI tests now...

@yuye-aws
Copy link
Member

yuye-aws commented Jan 2, 2025

CI failed due to flakey tests. This PR is fine.

@yuye-aws
Copy link
Member

yuye-aws commented Jan 3, 2025

@martin-gaievski
Copy link
Member

@martin-gaievski Do you have any hint to fix the flakey test: https://github.com/opensearch-project/neural-search/actions/runs/12521109826/job/35052143851?pr=969

we need a deep dive for that, three are multiple issues opened in relation to fixing flaky tests

@martin-gaievski
Copy link
Member

@brianf-aws can you rebase on the latest main please? There were multiple PRs merged recently to neural, having latest code will help RCA on the issue with tests

Previously used docId for saying where the problem is, there is no gauarantee that this field exists instead _id is always guaranteed to be there. Also logging of the mapping was provided because it may be misleading when a user has a number as a string. It can be confusing for a user to something like [3.3] is not a number when the reality is that it was sent like "3.3" thus the type of the value is provided to help users understand better

Signed-off-by: Brian Flores <[email protected]>
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after rebase

@yuye-aws
Copy link
Member

Pinging @martin-gaievski to review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants