-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathss-optimizer.tex
37 lines (22 loc) · 4.15 KB
/
ss-optimizer.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
\subsection{Query optimizer}
\label{subsec:optimizer}
The query optimizer is the main component in the framework. Calcite optimizes queries by repeatedly applying planner rules to a relational expression. A cost model guides the process, and the planner engine tries to generate an alternative expression that has the same semantics as the original but a lower cost.
Every component in the optimizer is extensible. One can add his own relational operators, rules, cost model, and statistics.
%%
\myparagraph{Planner rules.} Calcite includes a set of planner rules to transform expression trees. In particular, a rule matches a given pattern in the tree and executes a transformation that preserves semantics of that expression. At the moment of this writing, Calcite rules account to more than 75. However, it is rather common that data processing systems that rely for optimization on Calcite include their own rules, \eg\ to explore rewritings especially beneficial in that system.
\todo{Add figure with example of complex rule}
%%
\myparagraph{Metadata providers.} Calcite provides interfaces that allow data processing systems to plug their metadata information into the framework. Metadata is an important part of Calcite's optimizer, and it meets two main purposes: ($i$)~guiding the planner towards the goal of reducing the overall query plan, and ($ii$)~providing information to the rules while they are being applied. For instance, the metadata providers have functions that return the overall cost of executing a subexpression in the operator tree, the number of rows and the data size of the results of that expression, or the maximum degree of parallelism with which it can be executed. However, they can also provide information about the filter conditions that are present below a certain tree node.
\JCR{Implementation details about efficiency?}
Systems using Calcite can override all the providers methods. However, for many systems it is sufficient to provide statistics about their input data, and Calcite will do the rest of the work by using the default implementation of the providers.
%%Some example?
%%
\myparagraph{Planner engines.} The main goal of a planner engine is to trigger the rules provided to the engine till it reaches a given objective. It is important to note that engines are pluggable in the framework: the set of rules and metadata providers are common to all of them. At this moment, Calcite provides two different engines that try to take all the burden from the user about running the optimization process.
The first one, \textit{a cost-based planner engine}, triggers the input rules with the goal of \textit{reducing the overall expression cost}. The engine uses a dynamic programming algorithm, similar to~\cite{Graefe93thevolcano,DBLP:journals/debu/Graefe95a}, to create and track different alternative plans created by firing the rules given to the engine. Plans The cost model provided through the metadata providers helps the optimizer to ultimately decide which plan to use, as well as pruning the optimization graph to prevent the search space from exploding. Rules can be given in any order to the planner, as the search strategy avoids falling into local minima in the search space that may not be actually optimal.
The second engine is an \textit{exhaustive planner}, which triggers rules exhaustively till it generates an expression that is not modified by any of the rules anymore. This planner is useful to execute rules deterministically in a quick fashion, without taking into account the cost of the expression.
Users may choose to use one of the existing planner engines depending on their concrete needs, and switching from one to another, in case that their system requirements change, is straightforward. Alternatively, users may choose to generate \textit{multi-stage optimization} logic, in which different set of rules are applied in consecutive phases of the optimization process. Importantly, the existence of two planners allows Calcite users to reduce the overall optimization time by directing the search of different operator plans.
%%
\myparagraph{Materialized views}
% What are materialized views? Why are they important?
% What does Calcite include?
Some text.