-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for LIMIT
in pagination.
#1752
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,62 @@ | ||||||||
## Background | ||||||||
|
||||||||
Scrolled search is performed by request to `http://localhost:9200/<index>/_search?scroll=<timeout>` endpoint and it has a mandatory field `size`, which defines page size: | ||||||||
```json | ||||||||
{ | ||||||||
"size" : 5 | ||||||||
} | ||||||||
``` | ||||||||
Regular (non-scrolled/non-paged) search is performed by request to `http://localhost:9200/<index</_search` endpoint and it has not mandatory fields. Parameter `size` though defines maximum number of docs to be returned by search. | ||||||||
|
||||||||
## Problem statement | ||||||||
|
||||||||
`LIMIT` clause is being converted to `size` by SQL plugin during push down operation. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to properly define what we want to accomplish, but discuss what is currently the problem with the code. I would change this to talk about LIMIT and size conflict, but also include the use cases:
|
||||||||
Hereby comes the conflict in using `LIMIT` with pagination. | ||||||||
|
||||||||
## Solution | ||||||||
|
||||||||
Don't do push down for `LIMIT`. `LimitOperator` Physical Plan Tree node will cut off yielding search results with minimal overhead. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we utilize the limit operator how will this work with any other operator that needs to do post-processing. This also asks the question of how we will continue to add operators and how will they work with each other. We may need some way to chain operators if we don't want to have the limitation of one operator for post-processing per query. Rather than have an operator to limit the output with post processing why not include the functionality as part of the base class. We can have an optional limit if push down isn't available that can be performed in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the solution we talked about in the walk-through... I think you just need to reverse the business logic to execute all the rules on nodes before proceeding to the next node... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like this section is similar to |
||||||||
It seems simple, but here comes another problem. | ||||||||
|
||||||||
## Impediments | ||||||||
|
||||||||
`Optimizer` which calls push down operations sets page size after limit, which overwrites this settings. `Optimizer` does not apply rules in the order they are given. | ||||||||
|
||||||||
## Fix | ||||||||
|
||||||||
Rework [`Optimizer`](https://github.com/opensearch-project/sql/blob/57ce303740f64fe0279fc34aab0116f33f11fbe6/core/src/main/java/org/opensearch/sql/planner/optimizer/LogicalPlanOptimizer.java). | ||||||||
Current behavior: | ||||||||
```py | ||||||||
def Optimize: | ||||||||
for node in PlanTree: # Traverse the Logical Plan Tree | ||||||||
for rule in rules: # Enumerate rules | ||||||||
tryApplyRule() | ||||||||
``` | ||||||||
|
||||||||
Expected behavior: | ||||||||
```py | ||||||||
def Optimize: | ||||||||
for rule in rules: # Enumerate rules | ||||||||
for node in PlanTree: # Traverse the Logical Plan Tree | ||||||||
tryApplyRule() | ||||||||
``` | ||||||||
Rules list: | ||||||||
``` | ||||||||
... | ||||||||
CreateTableScanBuilder | ||||||||
PushDownPageSize | ||||||||
... | ||||||||
PUSH_DOWN_LIMIT | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this ordering important? |
||||||||
... | ||||||||
``` | ||||||||
|
||||||||
This gives us warranty that `pushDownLimit` operation would be rejected if `pushPageSize` called before. Then, not optimized Logical Plan Tree node `LogicalLimit` will be converted to `LimitOperator` Physical Plan tree node. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
|
||||||||
## Other changes | ||||||||
|
||||||||
1. Make `Optimizer` rules applied only once. | ||||||||
2. Reorder `Optimizer` rules. | ||||||||
3. Make `LimitOperator` properly serialized and deserialized. | ||||||||
4. Make `OpenSearchIndexScanBuilder::pushDownLimit` return `false` if `pushDownPageSize` was called before. | ||||||||
5. (Optional) Groom `Optimizer` to reduce amount of unchecked casts and uses raw classes. | ||||||||
6. (Optional) Rework `Optimizer` to make it a tree visitor. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of scope work. Please raise a follow-up ticket to do these later. You can label this as |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,8 +7,13 @@ | |
|
||
|
||
import java.util.Arrays; | ||
import java.util.stream.Collectors; | ||
import java.util.stream.Stream; | ||
|
||
import lombok.experimental.UtilityClass; | ||
import org.opensearch.sql.planner.logical.LogicalPlan; | ||
import org.opensearch.sql.planner.optimizer.LogicalPlanOptimizer; | ||
import org.opensearch.sql.planner.optimizer.Rule; | ||
import org.opensearch.sql.prometheus.planner.logical.rules.MergeAggAndIndexScan; | ||
import org.opensearch.sql.prometheus.planner.logical.rules.MergeAggAndRelation; | ||
import org.opensearch.sql.prometheus.planner.logical.rules.MergeFilterAndRelation; | ||
|
@@ -23,10 +28,10 @@ public class PrometheusLogicalPlanOptimizerFactory { | |
* Create Prometheus storage specified logical plan optimizer. | ||
*/ | ||
public static LogicalPlanOptimizer create() { | ||
return new LogicalPlanOptimizer(Arrays.asList( | ||
return new LogicalPlanOptimizer(Stream.of( | ||
new MergeFilterAndRelation(), | ||
new MergeAggAndIndexScan(), | ||
new MergeAggAndRelation() | ||
)); | ||
).map(r -> (Rule<LogicalPlan>)r).collect(Collectors.toList())); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is messy, and we can update the Rules classes to extend |
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just rename the argument?