-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poc: scroll query request #693
Conversation
Signed-off-by: Sean Kao <[email protected]>
@@ -32,6 +32,14 @@ public void open() { | |||
getChild().forEach(PhysicalPlan::open); | |||
} | |||
|
|||
public String getCursor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now only the OpenSearchIndexScan, which is the leaf of the AST, has the cursor from the response. This is a workaround to get that cursor when reading the results.
@@ -29,6 +30,8 @@ public interface Table { | |||
*/ | |||
PhysicalPlan implement(LogicalPlan plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want to get rid of this eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason?..is there a better way of doing this? Has anything been proposed on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was referring to the line above that doesn't use the PlanContext. All implement calls should use PlanContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the current implement
only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?
Codecov Report
@@ Coverage Diff @@
## main #693 +/- ##
=============================================
- Coverage 94.74% 62.76% -31.99%
=============================================
Files 283 10 -273
Lines 7676 658 -7018
Branches 561 119 -442
=============================================
- Hits 7273 413 -6860
+ Misses 349 192 -157
+ Partials 54 53 -1
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
public SearchRequest searchRequest() { | ||
return new SearchRequest() | ||
.indices(indexName.getIndexNames()) | ||
.scroll(DEFAULT_SCROLL_TIMEOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May want this to be configurable. In OpenSearch Setting? In Request body with fetch_size?
this.request = new OpenSearchQueryRequest(indexName, | ||
settings.getSettingValue(Settings.Key.QUERY_SIZE_LIMIT), exprValueFactory); | ||
if (fetchSize > 0) { | ||
this.request = new OpenSearchScrollQueryRequest(indexName, fetchSize, exprValueFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the ScrollQuery request kicks in
String cursor = response.getCursor(); | ||
if (cursor != null) { | ||
json.cursor(cursor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar should be done for all formatters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
if (cursor != null) { | ||
json.cursor(cursor); | ||
} | ||
|
||
// Populate other fields | ||
json.total(response.size()) | ||
.size(response.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Total should reference the number of results in the table, while size should be the number of results in this response
@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l | |||
result.add(plan.next()); | |||
} | |||
|
|||
QueryResponse response = new QueryResponse(physicalPlan.schema(), result); | |||
QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels wrong, feels out of place.
The original design for PhysicalPlan assumes that the only results meaningful when executing it, is the data rows it fetched (and filtered, sorted, projected, etc.), hence the plan.next() syntax above makes sense.
Now there's other thing (the cursor) that we also need from the execution result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is strange to have getCursor
added to base class and force each physical operator to implement it. We need to figure out better solution..
private String scrollId; | ||
|
||
/** Search request source builder. */ | ||
private final SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor request doesn't need this, but OpenSearchRequest requires this
LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery())); | ||
PlanContext planContext = new PlanContext(); | ||
planContext.setFetchSize(request.getFetchSize()); | ||
plan = sqlService.plan(logicalPlan, planContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar should be done for PPL
plan = sqlService.plan( | ||
sqlService.analyze( | ||
sqlService.parse(request.getQuery()))); | ||
LogicalPlan logicalPlan = sqlService.analyze(sqlService.parse(request.getQuery())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we may want to also carry planContext when generating the logical plan. Example use case: ML command can set the plan context to specify it needs pagination. Then when building the physical plan, the Planner knows to use scroll instead of regular query.
this.indexName = indexName; | ||
this.sourceBuilder = new SearchSourceBuilder(); | ||
sourceBuilder.from(0); | ||
sourceBuilder.size(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conflicts with LIMIT
We may want to forbid mix usage of LIMIT and scroll
@@ -116,10 +138,6 @@ private boolean isOnlySupportedFieldInPayload() { | |||
return SUPPORTED_FIELDS.containsAll(jsonContent.keySet()); | |||
} | |||
|
|||
private boolean isFetchSizeZeroIfPresent() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_size must be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This was for the new SQL engine. It used to can't handle scroll request and should redirect it to the legacy engine.
Now I remove this limitation so that the new engine handles scroll requests.
@@ -32,6 +32,9 @@ public class QueryResult implements Iterable<Object[]> { | |||
*/ | |||
private final Collection<ExprValue> exprValues; | |||
|
|||
@Getter | |||
private final String cursor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should make sure backwards compatible with legacy cursor implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
@@ -32,6 +32,14 @@ public void open() { | |||
getChild().forEach(PhysicalPlan::open); | |||
} | |||
|
|||
public String getCursor() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It breaks PhysicalPlan interface. Does it means our existing PhysicalPlan abstraction is not fit into cursor.
@EqualsAndHashCode | ||
@Getter | ||
@ToString | ||
public class OpenSearchScrollQueryRequest implements OpenSearchRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it cover aggregation pagination also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't
@@ -48,6 +50,11 @@ public PhysicalPlan implement(LogicalPlan plan) { | |||
return plan.accept(new OpenSearchSystemIndexDefaultImplementor(), null); | |||
} | |||
|
|||
@Override | |||
public PhysicalPlan implement(LogicalPlan plan, PlanContext planContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For schema describe, there's no use of PlanContext (for now)
} catch (IndexOutOfBoundsException e) { | ||
return getCursor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not a good idea to use try-catch as control flow in Java
@@ -40,7 +40,7 @@ public void execute(PhysicalPlan physicalPlan, ResponseListener<QueryResponse> l | |||
result.add(plan.next()); | |||
} | |||
|
|||
QueryResponse response = new QueryResponse(physicalPlan.schema(), result); | |||
QueryResponse response = new QueryResponse(physicalPlan.schema(), result, plan.getCursor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is strange to have getCursor
added to base class and force each physical operator to implement it. We need to figure out better solution..
@@ -29,6 +30,8 @@ public interface Table { | |||
*/ | |||
PhysicalPlan implement(LogicalPlan plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the current implement
only accepts logical plan. Unless we can encode cursor info in the plan, we have to pass in one more argument. If it seems so, I'm wondering why we still need the original method?
@EqualsAndHashCode | ||
@Getter | ||
@ToString | ||
public class OpenSearchScrollCursorRequest implements OpenSearchRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between this new class and existing OpenSearchScrollRequest
? Can we reuse and modify the existing one?
And same question for OpenSearchScrollQueryRequest
below. Wondering why we need two new classes for cursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing OpenSearchScrollRequest
is stateful, and assumes its search
will be called multiple times. I used it in another PR.
In this PR, though, the initial scroll query and the subsequent cursor queries happen at different requests. The OpenSearchScrollQueryRequest
is for invoking the initial scroll query, which specifies both the fetch_size
and query
. OpenSearchScrollCursorRequest
uses a cursor
to fetch the next page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out that in order to paginate some PPL queries, we indeed need to store some states, so using the existing OpenSearchScrollRequest
makes more sense than adding these two new classes.
String cursor = response.getCursor(); | ||
if (cursor != null) { | ||
json.cursor(cursor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that cursor should only work for JSON formatter. For others, such as raw, csv etc, we may just need to document the limitation.
Signed-off-by: Sean Kao [email protected]
Description
PoC for SQL scroll query request in new query engine
Cursor request isn't supported yet. The operator for sending OpenSearch request is done, but I haven't figured out how to invoke it. Example cursor request:
Issues Resolved
#656
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.