poc: scroll query request #693

seankao-az · 2022-07-15T16:50:40Z

Signed-off-by: Sean Kao [email protected]

Description

PoC for SQL scroll query request in new query engine

$ curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
  "fetch_size": 5, "query": "select DestAirportID from opensearch_dashboards_sample_data_flights"
}'

{
  "schema": [
    {
      "name": "DestAirportID",
      "type": "keyword"
    }
  ],
  "datarows": [
    [
      "SYD"
    ],
    [
      "VE05"
    ],
    [
      "VE05"
    ],
    [
      "TV01"
    ],
    [
      "XIY"
    ]
  ],
  "cursor": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFkJFRmh0MlQzUlBhTUhLVE8wdEFMa2cAAAAAAAAABRZsZ213bFFWdlNscTVhYUJKS2JCUjB3",
  "total": 5,
  "size": 5,
  "status": 200
}

Cursor request isn't supported yet. The operator for sending OpenSearch request is done, but I haven't figured out how to invoke it. Example cursor request:

$ curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{
"cursor": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFkJFRmh0MlQzUlBhTUhLVE8wdEFMa2cAAAAAAAAABRZsZ213bFFWdlNscTVhYUJKS2JCUjB3"
}'

Issues Resolved

#656

Check List

New functionality includes testing.
- All tests pass, including unit test, integration test and doctest
New functionality has been documented.
- New functionality has javadoc added
- New functionality has user manual doc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sean Kao <[email protected]>

seankao-az · 2022-07-15T16:53:54Z

core/src/main/java/org/opensearch/sql/planner/physical/PhysicalPlan.java

@@ -32,6 +32,14 @@ public void open() {
    getChild().forEach(PhysicalPlan::open);
  }

+  public String getCursor() {


Right now only the OpenSearchIndexScan, which is the leaf of the AST, has the cursor from the response. This is a workaround to get that cursor when reading the results.

seankao-az · 2022-07-15T16:55:30Z

core/src/main/java/org/opensearch/sql/storage/Table.java

@@ -29,6 +30,8 @@ public interface Table {
   */
  PhysicalPlan implement(LogicalPlan plan);


May want to get rid of this eventually.

Reason?..is there a better way of doing this? Has anything been proposed on this?

Oh I was referring to the line above that doesn't use the PlanContext. All implement calls should use PlanContext.

Yeah, the current implement only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?

codecov-commenter · 2022-07-15T17:00:47Z

Codecov Report

Merging #693 (67b3415) into main (8d9b459) will decrease coverage by 31.98%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##               main     #693       +/-   ##
=============================================
- Coverage     94.74%   62.76%   -31.99%     
=============================================
  Files           283       10      -273     
  Lines          7676      658     -7018     
  Branches        561      119      -442     
=============================================
- Hits           7273      413     -6860     
+ Misses          349      192      -157     
+ Partials         54       53        -1

Flag	Coverage Δ
query-workbench	`62.76% <ø> (ø)`
sql-engine	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../main/java/org/opensearch/sql/planner/Planner.java
.../opensearch/sql/planner/physical/PhysicalPlan.java
...rc/main/java/org/opensearch/sql/storage/Table.java
...opensearch/executor/OpenSearchExecutionEngine.java
...search/executor/protector/ResourceMonitorPlan.java
...ch/sql/opensearch/response/OpenSearchResponse.java
...search/sql/opensearch/storage/OpenSearchIndex.java
...ch/sql/opensearch/storage/OpenSearchIndexScan.java
...ensearch/storage/system/OpenSearchSystemIndex.java
.../opensearch/sql/protocol/response/QueryResult.java
... and 263 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d9b459...67b3415. Read the comment docs.

seankao-az · 2022-07-15T17:09:56Z

...search/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchScrollQueryRequest.java

+  public SearchRequest searchRequest() {
+    return new SearchRequest()
+        .indices(indexName.getIndexNames())
+        .scroll(DEFAULT_SCROLL_TIMEOUT)


May want this to be configurable. In OpenSearch Setting? In Request body with fetch_size?

seankao-az · 2022-07-15T17:11:42Z

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndexScan.java

-    this.request = new OpenSearchQueryRequest(indexName,
-        settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory);
+    if (fetchSize > 0) {
+      this.request = new OpenSearchScrollQueryRequest(indexName, fetchSize, exprValueFactory);


This is where the ScrollQuery request kicks in

seankao-az · 2022-07-15T17:13:24Z

protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java

+    String cursor = response.getCursor();
+    if (cursor != null) {
+      json.cursor(cursor);
+    }


Similar should be done for all formatters

Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.

seankao-az · 2022-07-15T17:14:19Z

protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java

+    if (cursor != null) {
+      json.cursor(cursor);
+    }
+
    // Populate other fields
    json.total(response.size())
        .size(response.size())


TODO: Total should reference the number of results in the table, while size should be the number of results in this response

seankao-az · 2022-07-15T17:18:17Z

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java

@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l
              result.add(plan.next());
            }

-            QueryResponse response = new QueryResponse(physicalPlan.schema(), result);
+            QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor());


This feels wrong, feels out of place.

The original design for PhysicalPlan assumes that the only results meaningful when executing it, is the data rows it fetched (and filtered, sorted, projected, etc.), hence the plan.next() syntax above makes sense.
Now there's other thing (the cursor) that we also need from the execution result.

Yeah, it is strange to have getCursor added to base class and force each physical operator to implement it. We need to figure out better solution..

seankao-az · 2022-07-15T17:23:58Z

...earch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchScrollCursorRequest.java

+  private String scrollId;
+
+  /** Search request source builder. */
+  private final SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();


Cursor request doesn't need this, but OpenSearchRequest requires this

seankao-az · 2022-07-15T17:47:14Z

legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java

+      LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery()));
+      PlanContext planContext = new PlanContext();
+      planContext.setFetchSize(request.getFetchSize());
+      plan = sqlService.plan(logicalPlan, planContext);


Similar should be done for PPL

seankao-az · 2022-07-15T17:58:55Z

legacy/src/main/java/org/opensearch/sql/legacy/plugin/RestSQLQueryAction.java

-      plan = sqlService.plan(
-                sqlService.analyze(
-                    sqlService.parse(request.getQuery())));
+      LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery()));


In the future we may want to also carry planContext when generating the logical plan. Example use case: ML command can set the plan context to specify it needs pagination. Then when building the physical plan, the Planner knows to use scroll instead of regular query.

seankao-az · 2022-07-15T19:46:42Z

...search/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchScrollQueryRequest.java

+    this.indexName = indexName;
+    this.sourceBuilder = new SearchSourceBuilder();
+    sourceBuilder.from(0);
+    sourceBuilder.size(size);


This conflicts with LIMIT
We may want to forbid mix usage of LIMIT and scroll

penghuo · 2022-07-19T05:33:32Z

sql/src/main/java/org/opensearch/sql/sql/domain/SQLQueryRequest.java

@@ -116,10 +138,6 @@ private boolean isOnlySupportedFieldInPayload() {
    return SUPPORTED_FIELDS.containsAll(jsonContent.keySet());
  }

-  private boolean isFetchSizeZeroIfPresent() {


fetch_size must be 0?

No. This was for the new SQL engine. It used to can't handle scroll request and should redirect it to the legacy engine.
Now I remove this limitation so that the new engine handles scroll requests.

penghuo · 2022-07-19T05:35:01Z

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java

@@ -32,6 +32,9 @@ public class QueryResult implements Iterable<Object[]> {
   */
  private final Collection<ExprValue> exprValues;

+  @Getter
+  private final String cursor;


we should make sure backwards compatible with legacy cursor implementation.

penghuo · 2022-07-19T05:43:56Z

core/src/main/java/org/opensearch/sql/planner/physical/PhysicalPlan.java

@@ -32,6 +32,14 @@ public void open() {
    getChild().forEach(PhysicalPlan::open);
  }

+  public String getCursor() {


It breaks PhysicalPlan interface. Does it means our existing PhysicalPlan abstraction is not fit into cursor.

penghuo · 2022-07-19T05:49:38Z

...search/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchScrollQueryRequest.java

+@EqualsAndHashCode
+@Getter
+@ToString
+public class OpenSearchScrollQueryRequest implements OpenSearchRequest {


Does it cover aggregation pagination also?

No, it doesn't

seankao-az · 2022-07-19T16:20:51Z

...search/src/main/java/org/opensearch/sql/opensearch/storage/system/OpenSearchSystemIndex.java

@@ -48,6 +50,11 @@ public PhysicalPlan implement(LogicalPlan plan) {
    return plan.accept(new OpenSearchSystemIndexDefaultImplementor(), null);
  }

+  @Override
+  public PhysicalPlan implement(LogicalPlan plan, PlanContext planContext) {


For schema describe, there's no use of PlanContext (for now)

dai-chen · 2022-07-19T16:18:53Z

core/src/main/java/org/opensearch/sql/planner/physical/PhysicalPlan.java

+    } catch (IndexOutOfBoundsException e) {
+      return getCursor();


probably not a good idea to use try-catch as control flow in Java

dai-chen · 2022-07-19T16:21:10Z

opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java

@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l
              result.add(plan.next());
            }

-            QueryResponse response = new QueryResponse(physicalPlan.schema(), result);
+            QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor());


Yeah, it is strange to have getCursor added to base class and force each physical operator to implement it. We need to figure out better solution..

dai-chen · 2022-07-19T16:27:06Z

core/src/main/java/org/opensearch/sql/storage/Table.java

@@ -29,6 +30,8 @@ public interface Table {
   */
  PhysicalPlan implement(LogicalPlan plan);


Yeah, the current implement only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?

dai-chen · 2022-07-19T16:30:34Z

...earch/src/main/java/org/opensearch/sql/opensearch/request/OpenSearchScrollCursorRequest.java

+@EqualsAndHashCode
+@Getter
+@ToString
+public class OpenSearchScrollCursorRequest implements OpenSearchRequest {


What is the difference between this new class and existing OpenSearchScrollRequest? Can we reuse and modify the existing one?

And same question for OpenSearchScrollQueryRequest below. Wondering why we need two new classes for cursor?

The existing OpenSearchScrollRequest is stateful, and assumes its search will be called multiple times. I used it in another PR.
In this PR, though, the initial scroll query and the subsequent cursor queries happen at different requests. The OpenSearchScrollQueryRequest is for invoking the initial scroll query, which specifies both the fetch_size and query. OpenSearchScrollCursorRequest uses a cursor to fetch the next page.

It turned out that in order to paginate some PPL queries, we indeed need to store some states, so using the existing OpenSearchScrollRequest makes more sense than adding these two new classes.

dai-chen · 2022-07-19T17:10:53Z

protocol/src/main/java/org/opensearch/sql/protocol/response/format/JdbcResponseFormatter.java

+    String cursor = response.getCursor();
+    if (cursor != null) {
+      json.cursor(cursor);
+    }


Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.

scroll query request

67b3415

Signed-off-by: Sean Kao <[email protected]>

seankao-az commented Jul 15, 2022

View reviewed changes

penghuo reviewed Jul 19, 2022

View reviewed changes

seankao-az commented Jul 19, 2022

View reviewed changes

dai-chen reviewed Jul 19, 2022

View reviewed changes

seankao-az mentioned this pull request Jul 25, 2022

[FEATURE] - Support pagination for PPL and SQL query #656

Closed

2 tasks

seankao-az changed the title ~~scroll query request~~ poc: scroll query request Aug 2, 2022

seankao-az closed this Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poc: scroll query request #693

poc: scroll query request #693

seankao-az commented Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022 •

edited

Loading

vamsimanohar Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022

dai-chen Jul 19, 2022

codecov-commenter commented Jul 15, 2022

seankao-az Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022

dai-chen Jul 19, 2022

seankao-az Jul 20, 2022

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022 •

edited

Loading

dai-chen Jul 19, 2022

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022

seankao-az Jul 15, 2022

penghuo Jul 19, 2022

seankao-az Jul 20, 2022

penghuo Jul 19, 2022

seankao-az Jul 20, 2022

penghuo Jul 19, 2022

penghuo Jul 19, 2022

seankao-az Jul 20, 2022

seankao-az Jul 19, 2022

dai-chen Jul 19, 2022

dai-chen Jul 19, 2022

dai-chen Jul 19, 2022

dai-chen Jul 19, 2022

seankao-az Jul 20, 2022

seankao-az Jul 20, 2022

dai-chen Jul 19, 2022

		@@ -29,6 +30,8 @@ public interface Table {
		*/
		PhysicalPlan implement(LogicalPlan plan);

poc: scroll query request #693

poc: scroll query request #693

Conversation

seankao-az commented Jul 15, 2022 • edited Loading

Description

Issues Resolved

Check List

Choose a reason for hiding this comment

seankao-az Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

vamsimanohar Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jul 15, 2022

Codecov Report

seankao-az Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seankao-az Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seankao-az commented Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022 •

edited

Loading

vamsimanohar Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022 •

edited

Loading

seankao-az Jul 15, 2022 •

edited

Loading