Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poc: scroll query request #693

Closed
wants to merge 1 commit into from
Closed

poc: scroll query request #693

wants to merge 1 commit into from

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Jul 15, 2022

Signed-off-by: Sean Kao [email protected]

Description

PoC for SQL scroll query request in new query engine

$ curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
  "fetch_size": 5, "query": "select DestAirportID from opensearch_dashboards_sample_data_flights"
}'

{
  "schema": [
    {
      "name": "DestAirportID",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "SYD"
    ],
    [
      "VE05"
    ],
    [
      "VE05"
    ],
    [
      "TV01"
    ],
    [
      "XIY"
    ]
  ],
  "cursor": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFkJFRmh0MlQzUlBhTUhLVE8wdEFMa2cAAAAAAAAABRZsZ213bFFWdlNscTVhYUJKS2JCUjB3",
  "total": 5,
  "size": 5,
  "status": 200
}

Cursor request isn't supported yet. The operator for sending OpenSearch request is done, but I haven't figured out how to invoke it. Example cursor request:

$ curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"cursor": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFkJFRmh0MlQzUlBhTUhLVE8wdEFMa2cAAAAAAAAABRZsZ213bFFWdlNscTVhYUJKS2JCUjB3"
}'

Issues Resolved

#656

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sean Kao <[email protected]>
@@ -32,6 +32,14 @@ public void open() {
getChild().forEach(PhysicalPlan::open);
}

public String getCursor() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now only the OpenSearchIndexScan, which is the leaf of the AST, has the cursor from the response. This is a workaround to get that cursor when reading the results.

@@ -29,6 +30,8 @@ public interface Table {
*/
PhysicalPlan implement(LogicalPlan plan);
Copy link
Collaborator Author

@seankao-az seankao-az Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to get rid of this eventually.

Copy link
Member

@vamsimanohar vamsimanohar Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason?..is there a better way of doing this? Has anything been proposed on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was referring to the line above that doesn't use the PlanContext. All implement calls should use PlanContext.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the current implement only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?

@codecov-commenter
Copy link

Codecov Report

Merging #693 (67b3415) into main (8d9b459) will decrease coverage by 31.98%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##               main     #693       +/-   ##
=============================================
- Coverage     94.74%   62.76%   -31.99%     
=============================================
  Files           283       10      -273     
  Lines          7676      658     -7018     
  Branches        561      119      -442     
=============================================
- Hits           7273      413     -6860     
+ Misses          349      192      -157     
+ Partials         54       53        -1     
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../main/java/org/opensearch/sql/planner/Planner.java
.../opensearch/sql/planner/physical/PhysicalPlan.java
...rc/main/java/org/opensearch/sql/storage/Table.java
...opensearch/executor/OpenSearchExecutionEngine.java
...search/executor/protector/ResourceMonitorPlan.java
...ch/sql/opensearch/response/OpenSearchResponse.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...ch/sql/opensearch/storage/OpenSearchIndexScan.java
...ensearch/storage/system/OpenSearchSystemIndex.java
.../opensearch/sql/protocol/response/QueryResult.java
... and 263 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d9b459...67b3415. Read the comment docs.

public SearchRequest searchRequest() {
return new SearchRequest()
.indices(indexName.getIndexNames())
.scroll(DEFAULT_SCROLL_TIMEOUT)
Copy link
Collaborator Author

@seankao-az seankao-az Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want this to be configurable. In OpenSearch Setting? In Request body with fetch_size?

this.request = new OpenSearchQueryRequest(indexName,
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory);
if (fetchSize > 0) {
this.request = new OpenSearchScrollQueryRequest(indexName, fetchSize, exprValueFactory);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the ScrollQuery request kicks in

String cursor = response.getCursor();
if (cursor != null) {
json.cursor(cursor);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar should be done for all formatters

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

if (cursor != null) {
json.cursor(cursor);
}

// Populate other fields
json.total(response.size())
.size(response.size())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Total should reference the number of results in the table, while size should be the number of results in this response

@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l
result.add(plan.next());
}

QueryResponse response = new QueryResponse(physicalPlan.schema(), result);
QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor());
Copy link
Collaborator Author

@seankao-az seankao-az Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels wrong, feels out of place.

The original design for PhysicalPlan assumes that the only results meaningful when executing it, is the data rows it fetched (and filtered, sorted, projected, etc.), hence the plan.next() syntax above makes sense.
Now there's other thing (the cursor) that we also need from the execution result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is strange to have getCursor added to base class and force each physical operator to implement it. We need to figure out better solution..

private String scrollId;

/** Search request source builder. */
private final SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor request doesn't need this, but OpenSearchRequest requires this

LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery()));
PlanContext planContext = new PlanContext();
planContext.setFetchSize(request.getFetchSize());
plan = sqlService.plan(logicalPlan, planContext);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar should be done for PPL

plan = sqlService.plan(
sqlService.analyze(
sqlService.parse(request.getQuery())));
LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery()));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we may want to also carry planContext when generating the logical plan. Example use case: ML command can set the plan context to specify it needs pagination. Then when building the physical plan, the Planner knows to use scroll instead of regular query.

this.indexName = indexName;
this.sourceBuilder = new SearchSourceBuilder();
sourceBuilder.from(0);
sourceBuilder.size(size);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with LIMIT
We may want to forbid mix usage of LIMIT and scroll

@@ -116,10 +138,6 @@ private boolean isOnlySupportedFieldInPayload() {
return SUPPORTED_FIELDS.containsAll(jsonContent.keySet());
}

private boolean isFetchSizeZeroIfPresent() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_size must be 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This was for the new SQL engine. It used to can't handle scroll request and should redirect it to the legacy engine.
Now I remove this limitation so that the new engine handles scroll requests.

@@ -32,6 +32,9 @@ public class QueryResult implements Iterable<Object[]> {
*/
private final Collection<ExprValue> exprValues;

@Getter
private final String cursor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make sure backwards compatible with legacy cursor implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

@@ -32,6 +32,14 @@ public void open() {
getChild().forEach(PhysicalPlan::open);
}

public String getCursor() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It breaks PhysicalPlan interface. Does it means our existing PhysicalPlan abstraction is not fit into cursor.

@EqualsAndHashCode
@Getter
@ToString
public class OpenSearchScrollQueryRequest implements OpenSearchRequest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it cover aggregation pagination also?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't

@@ -48,6 +50,11 @@ public PhysicalPlan implement(LogicalPlan plan) {
return plan.accept(new OpenSearchSystemIndexDefaultImplementor(), null);
}

@Override
public PhysicalPlan implement(LogicalPlan plan, PlanContext planContext) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For schema describe, there's no use of PlanContext (for now)

Comment on lines +38 to +39
} catch (IndexOutOfBoundsException e) {
return getCursor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not a good idea to use try-catch as control flow in Java

@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l
result.add(plan.next());
}

QueryResponse response = new QueryResponse(physicalPlan.schema(), result);
QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is strange to have getCursor added to base class and force each physical operator to implement it. We need to figure out better solution..

@@ -29,6 +30,8 @@ public interface Table {
*/
PhysicalPlan implement(LogicalPlan plan);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the current implement only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?

@EqualsAndHashCode
@Getter
@ToString
public class OpenSearchScrollCursorRequest implements OpenSearchRequest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this new class and existing OpenSearchScrollRequest? Can we reuse and modify the existing one?

And same question for OpenSearchScrollQueryRequest below. Wondering why we need two new classes for cursor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing OpenSearchScrollRequest is stateful, and assumes its search will be called multiple times. I used it in another PR.
In this PR, though, the initial scroll query and the subsequent cursor queries happen at different requests. The OpenSearchScrollQueryRequest is for invoking the initial scroll query, which specifies both the fetch_size and query. OpenSearchScrollCursorRequest uses a cursor to fetch the next page.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that in order to paginate some PPL queries, we indeed need to store some states, so using the existing OpenSearchScrollRequest makes more sense than adding these two new classes.

String cursor = response.getCursor();
if (cursor != null) {
json.cursor(cursor);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.

@seankao-az seankao-az changed the title scroll query request poc: scroll query request Aug 2, 2022
@seankao-az seankao-az closed this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants