-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for LIMIT
in pagination.
#1752
Design for LIMIT
in pagination.
#1752
Conversation
Signed-off-by: Yury-Fridlyand <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/pagination/limit #1752 +/- ##
==============================================================
+ Coverage 97.33% 98.08% +0.74%
+ Complexity 4408 3436 -972
==============================================================
Files 388 292 -96
Lines 10938 8364 -2574
Branches 773 573 -200
==============================================================
- Hits 10647 8204 -2443
+ Misses 284 157 -127
+ Partials 7 3 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Yury-Fridlyand <[email protected]>
|
||
## Solution | ||
|
||
Don't do push down for `LIMIT`. `LimitOperator` Physical Plan Tree node will cut off yielding search results with minimal overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we utilize the limit operator how will this work with any other operator that needs to do post-processing. This also asks the question of how we will continue to add operators and how will they work with each other. We may need some way to chain operators if we don't want to have the limitation of one operator for post-processing per query.
Rather than have an operator to limit the output with post processing why not include the functionality as part of the base class. We can have an optional limit if push down isn't available that can be performed in the PhysicalPlan
base class prior to the inheriting class's next()
call. Another option would be to have the limit post-processing performed in the ResourceMonitorPlan
prior to calling the delegate
next()
.
} | ||
|
||
/** | ||
* Optimize {@link LogicalPlan}. | ||
*/ | ||
public LogicalPlan optimize(LogicalPlan plan) { | ||
LogicalPlan optimized = internalOptimize(plan); | ||
var node = plan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just rename the argument?
|
||
## Problem statement | ||
|
||
`LIMIT` clause is being converted to `size` by SQL plugin during push down operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to properly define what we want to accomplish, but discuss what is currently the problem with the code. I would change this to talk about LIMIT and size conflict, but also include the use cases:
- When LIMIT > size
- When size < LIMIT
- When size == LIMIT (this is kind of naive case, but maybe just mention it anyways)
Then discuss exactly how we expect the system to behave (which does it return a cursor, when does it NOT return a cursor, where does it break (max_window_size), and does this override the default fetch size.
If there is anything to add in terms of how the JDBC driver will behave, then include it here.
|
||
## Solution | ||
|
||
Don't do push down for `LIMIT`. `LimitOperator` Physical Plan Tree node will cut off yielding search results with minimal overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the solution we talked about in the walk-through... I think you just need to reverse the business logic to execute all the rules on nodes before proceeding to the next node...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this section is similar to Fix
below
CreateTableScanBuilder | ||
PushDownPageSize | ||
... | ||
PUSH_DOWN_LIMIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this ordering important?
... | ||
``` | ||
|
||
This gives us warranty that `pushDownLimit` operation would be rejected if `pushPageSize` called before. Then, not optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives us warranty that `pushDownLimit` operation would be rejected if `pushPageSize` called before. Then, not optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node. | |
This gives us guarantee that `pushDownLimit` operation will be rejected if `pushPageSize` already called. In that case, the not-optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node. | |
3. Make `LimitOperator` properly serialized and deserialized. | ||
4. Make `OpenSearchIndexScanBuilder::pushDownLimit` return `false` if `pushDownPageSize` was called before. | ||
5. (Optional) Groom `Optimizer` to reduce amount of unchecked casts and uses raw classes. | ||
6. (Optional) Rework `Optimizer` to make it a tree visitor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope work. Please raise a follow-up ticket to do these later. You can label this as maintenance
ticket.
new MergeFilterAndRelation(), | ||
new MergeAggAndIndexScan(), | ||
new MergeAggAndRelation() | ||
)); | ||
).map(r -> (Rule<LogicalPlan>)r).collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is messy, and we can update the Rules classes to extend LogicalPlan
instead.
Design review meeting notes:
|
Is this PR still in progress, or should we close it as stale? Looks like it gets linked a lot from elsewhere but not actively updated |
Description
This PR is design review for supporting
LIMIT
in pagination. It includes:TODOs:
page_size > limit
MergeFilterAndFilter
andPushFilterUnderSort
optimizer rules (probably by combiningWHERE
andHAVING
clauses or with PPL query)Issues Resolved
N/A
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.