Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle create index with batch FlintJob #2734

Merged
merged 8 commits into from
Jun 11, 2024

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Jun 10, 2024

Description

Use batch FlintJob to handle create index query. It's already used for streaming query. This PR makes creating auto_refresh=false index use FlintJob as well.
This would resolve the bug where dropping a manual refresh index could result in cancellation of the FlintREPL job that executed the create index statement.

Alongside this change, also make the RefreshQueryHandler borrow from lease, instead of BatchQueryHandler borrowing, because the concurrent refresh job limit shouldn't apply to batch jobs in general. The behavior for refresh query would remain the same.
Making the change in this same PR because prior to this PR, there is no query that can be used to test BatchQueryHandler.

Tests

Added unit tests that cover the code path for dispatching to IndexDMLHandler, and create index queries.

End to end test

  • create index auto_refresh=false -> drop index
    • index create should be a batch FlintJob
    • drop index should
      • not cancel any job if the job running the create statement is completed
      • cancel the batch job if it isn't completed yet
  • create index auto_refresh=true -> alter index auto_refresh=false -> drop index
    • index create should be a streaming FlintJob
    • alter index should cancel the streaming job
    • drop index should not further cancel any job
  • create index auto_refresh=false -> alter index auto_refresh=true -> drop index
    • index create should be a batch FlintJob
    • alter index should start a streaming job
    • drop index should cancel the streaming job

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@seankao-az seankao-az self-assigned this Jun 11, 2024
@seankao-az seankao-az added bug Something isn't working backport 2.x labels Jun 11, 2024
Copy link

codecov bot commented Jun 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.47%. Comparing base (cd95130) to head (4e05e17).
Report is 1 commits behind head on main.

Current head 4e05e17 differs from pull request most recent head 20ae2b0

Please upload reports for the commit 20ae2b0 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2734   +/-   ##
=========================================
  Coverage     95.47%   95.47%           
- Complexity     5193     5197    +4     
=========================================
  Files           509      509           
  Lines         14637    14639    +2     
  Branches        982      983    +1     
=========================================
+ Hits          13974    13976    +2     
  Misses          641      641           
  Partials         22       22           
Flag Coverage Δ
sql-engine 95.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seankao-az seankao-az force-pushed the batch-job-for-create branch from 4e05e17 to 3204307 Compare June 11, 2024 01:46
@@ -59,6 +60,8 @@ public String cancelJob(AsyncQueryJobMetadata asyncQueryJobMetadata) {
@Override
public DispatchQueryResponse submit(
DispatchQueryRequest dispatchQueryRequest, DispatchQueryContext context) {
leaseManager.borrow(new LeaseRequest(JobType.BATCH, dispatchQueryRequest.getDatasource()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this warrant any change in documentation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're moving this from parent class? The new create index job doesn't need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior doesn't change for refresh query handler.
also according to https://github.com/opensearch-project/sql/blob/main/docs/user/admin/settings.rst#pluginsqueryexecutionenginesparkrefresh_joblimit
it's intended to affect refresh job only

Copy link
Collaborator Author

@seankao-az seankao-az Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new create index job doesn't need this?

DDL shouldn't be affected by the concurrent limit. Without the change this following test would fail:

public void concurrentRefreshJobLimitNotAppliedToDDL() {

@vamsimanohar
Copy link
Member

Have we tested this on a working stack?

@seankao-az
Copy link
Collaborator Author

Have we tested this on a working stack?

yes in my test domain

dai-chen
dai-chen previously approved these changes Jun 11, 2024
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the current approach is make sure cancel job run won't happen on REPL job. I think we can discuss later how to make index operation more clear. For example, instead of separating EMR-S job, should we avoiding cancel logic for index not in refreshing state at all?

ykmr1224
ykmr1224 previously approved these changes Jun 11, 2024
Copy link
Collaborator

@ykmr1224 ykmr1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments. Those can be addressed in separate PR.

Comment on lines +414 to +417
HashMap<String, String> tags = new HashMap<>();
tags.put(DATASOURCE_TAG_KEY, "my_glue");
tags.put(CLUSTER_NAME_TAG_KEY, TEST_CLUSTER_NAME);
tags.put(JOB_TYPE_TAG_KEY, JobType.BATCH.getText());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider using ImmutableMap.of (can be future refactoring)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole file use this same pattern. Can be a separate refactor PR for this for next release

Comment on lines +424 to +428
new HashMap<>() {
{
put(FLINT_INDEX_STORE_AWSREGION_KEY, "eu-west-1");
}
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same as above

@@ -79,6 +79,9 @@ private AsyncQueryHandler getQueryHandlerForFlintExtensionQuery(
return queryHandlerFactory.getIndexDMLHandler();
} else if (isEligibleForStreamingQuery(indexQueryDetails)) {
return queryHandlerFactory.getStreamingQueryHandler();
} else if (IndexQueryActionType.CREATE.equals(indexQueryDetails.getIndexQueryActionType())) {
// create should be handled by batch handler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment might be redundant. If we keep this, should we add more context ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add a oneliner why this should be the case

Signed-off-by: Sean Kao <[email protected]>
@seankao-az seankao-az dismissed stale reviews from ykmr1224 and dai-chen via 20ae2b0 June 11, 2024 22:20
@seankao-az seankao-az merged commit b959039 into opensearch-project:main Jun 11, 2024
20 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 11, 2024
* update grammar file

Signed-off-by: Sean Kao <[email protected]>

* batch job for create manual refresh index

Signed-off-by: Sean Kao <[email protected]>

* dispatcher test for index dml query

Signed-off-by: Sean Kao <[email protected]>

* borrow lease for refresh query, not batch

Signed-off-by: Sean Kao <[email protected]>

* spotlessApply

Signed-off-by: Sean Kao <[email protected]>

* add release note

Signed-off-by: Sean Kao <[email protected]>

* update comment

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
(cherry picked from commit b959039)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
seankao-az pushed a commit that referenced this pull request Jun 11, 2024
* update grammar file



* batch job for create manual refresh index



* dispatcher test for index dml query



* borrow lease for refresh query, not batch



* spotlessApply



* add release note



* update comment



---------


(cherry picked from commit b959039)

Signed-off-by: Sean Kao <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ykmr1224 pushed a commit to ykmr1224/sql that referenced this pull request Aug 17, 2024
* update grammar file

Signed-off-by: Sean Kao <[email protected]>

* batch job for create manual refresh index

Signed-off-by: Sean Kao <[email protected]>

* dispatcher test for index dml query

Signed-off-by: Sean Kao <[email protected]>

* borrow lease for refresh query, not batch

Signed-off-by: Sean Kao <[email protected]>

* spotlessApply

Signed-off-by: Sean Kao <[email protected]>

* add release note

Signed-off-by: Sean Kao <[email protected]>

* update comment

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>

(cherry picked from commit b959039)
jzonthemtn pushed a commit to jzonthemtn/sql that referenced this pull request Aug 28, 2024
* update grammar file

Signed-off-by: Sean Kao <[email protected]>

* batch job for create manual refresh index

Signed-off-by: Sean Kao <[email protected]>

* dispatcher test for index dml query

Signed-off-by: Sean Kao <[email protected]>

* borrow lease for refresh query, not batch

Signed-off-by: Sean Kao <[email protected]>

* spotlessApply

Signed-off-by: Sean Kao <[email protected]>

* add release note

Signed-off-by: Sean Kao <[email protected]>

* update comment

Signed-off-by: Sean Kao <[email protected]>

---------

Signed-off-by: Sean Kao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants