-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathss-relexprs.tex
20 lines (11 loc) · 3.68 KB
/
ss-relexprs.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
\subsection{Relational expressions}
\label{subsec:relexprs}
Relational algebra~\cite{DBLP:journals/cacm/Codd70} lies at the core of Calcite. In addition to the operators that can be used to express the most common data manipulation operations, such as \textit{filter}, \textit{project}, \textit{join} \etc , Calcite includes additional operators that meet different purposes, \eg being able to concisely represent complex operations, or recognize optimization opportunities more efficiently.
For instance, it has become common for OLAP, decision making, and streaming applications to use window definitions to express complex analytic functions such as moving average of a quantity over a time period or number or rows. Thus, Calcite introduces a \textit{window} operator that encapsulates the window definition, \ie upper and lower bound, partitioning \etc, and the aggregate functions to execute on each window.
Nevertheless, it is recommended that users combine existing operators wherever possible, rather than defining new ones. The core relational algebra is expressive, and adding a new operator requires adding a planner rule for each combination of the new operator with existing operators.
%%
\myparagraph{Traits.} Calcite does not use different entities to represent logical and physical operators. Instead, it describes the physical properties associated with an operator using \textit{traits}. These traits help the optimizer evaluate the cost of different alternative plans. It is important to note that if an operator property is considered a trait, changing its value does not change the logical expression being evaluated, \ie the rows produced by the given operator will still be the same.
During optimization, Calcite will try to enforce certain traits on relational expressions, \eg the sort order of certain columns. Relational operators can implement a \textit{converter} interface that indicates how to convert the physical attribute of the expression from one value to another.
Calcite includes common traits that describe the physical properties of the data produced by a relational expression, such as \textit{ordering}, \textit{grouping}, and \textit{partitioning}. Similar to~\cite{DBLP:conf/icde/ZhouLC10}, the optimizer can reason about these properties and exploit them to find plans that avoid unnecessary operations.
In addition to these properties, one of the main features of Calcite is the \textit{calling convention} trait. Essentially, the trait represents the data processing system on which the expression will be be executed. Including the calling convention as a trait allows Calcite to meet its goal of optimizing transparently queries whose execution might span over different engines \ie the convention will be treated as any other physical property.
\todo{Add figure}For example, consider joining a \textit{Products} table held in MySQL to an \textit{Orders} table held in Splunk. Initially, the scan of \textit{Orders} takes place in \textit{splunk} convention and the scan of \textit{Products} is in \textit{jdbc-mysql} convention (the tables have to be scanned inside their respective engines), and the join is in logical convention (meaning that no implementation has been chosen). One possible implementation is to use Apache Spark as an external engine: the join is converted to \textit{spark} convention, and its inputs are converters from \textit{jdbc-mysql} and \textit{splunk} to \textit{spark} convention. But there is a more efficient implementation: exploiting the fact that Splunk can perform lookups into MySQL via ODBC, a planner rule pushes the join through the \textit{splunk}-to-\textit{spark} converter, and the join is now in \textit{splunk} convention, running inside the Splunk engine.