-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy paths-architecture.tex
50 lines (33 loc) · 2.98 KB
/
s-architecture.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
\section{Architecture}
\label{sec:archi}
Calcite contains many of the pieces that comprise a typical database management system. However, it omits some key components, \eg storage of data, algorithms to process data, and a repository for storing metadata. This decision is intentional: this makes Calcite an excellent choice for mediating between applications having one or more data storage locations and using multiple data processing engines. It is also a perfect foundation for building your own data processing system.
\begin{figure}[t]
\centering
\includegraphics[width=0.94\columnwidth]{figures/architecture.png}
\vspacebfigure\caption{Apache Calcite architecture and interaction.\label{fig:arch}\JCR{Could we include the adapters in the figure?}}\vspaceafigure
\end{figure}
Figure~\ref{fig:arch} outlines the main components of Calcite's architecture. In the Figure, the dashed lines represent possible external interactions with the framework.
First of all, Calcite contains a query parser and validator that can translate a SQL query to a tree of relational operators. As we mentioned earlier, Calcite does not contain a \textit{storage layer}. Hence, it provides a mechanism to define table schemas and views using JSON files, so it can be used as a stand-alone system.
The tree of relational operators is the internal representation on which Calcite's optimizer works. The optimization engine consists mainly of three components which guide the process: rules, metadata providers, and planner engines. We talk about these components in more detail in Section~\ref{subsec:optimizer}.
Once the query has been optimized, Calcite can translate the relational expression back to SQL. Note that this allows Calcite to work as a stand-alone system on top of any data application with a SQL interface.
However, Calcite architecture is not only tailored towards this fashion of interacting with the framework. In fact, it is far more common that data processing systems choose to use their own parser for their own query language. In that case, Calcite also allows to easily build the operator tree directly instantiating the relational operators. Alternately, one can use the built-in \textit{relational expressions builder} interface. For instance, assume that we want to express the following SQL query using the expressions builder:
\begin{lstlisting}[language=SQL]
SELECT deptno, count(*) AS c, sum(sal) AS s
FROM emp
GROUP BY deptno;
\end{lstlisting}
The equivalent expression looks as follows:
\begin{lstlisting}[language=Java]
final RelNode node = builder
.scan("EMP")
.aggregate(builder.groupKey("DEPTNO"),
builder.count(false, "C"),
builder.sum(false, "S", builder.field("SAL")))
.build();
\end{lstlisting}
Observe that the interface exposes the main constructs for building relational expressions. After the optimization phase is finished, the application can retrieve the optimized relational expression.
\input{ss-dm-ts}
\input{ss-relexprs}
\input{ss-optimizer}
\input{ss-adapters}
\input{ss-streaming}