This project sets up a ClickHouse Cluster with one shard and two replicas using ClickHouse Keeper. Each component is containerized using Docker Compose.
- 1 Shard, 2 Replicas: Provides data replication and failover capabilities.
- ClickHouse Keeper: Manages cluster state and leader election.
- Separation of Components: Each ClickHouse and Keeper instance runs in separate Docker Compose files, allowing for flexibility in deployment across different servers.
clickhouse-01
: First replica of the ClickHouse database.clickhouse-02
: Second replica of the ClickHouse database.clickhouse-keeper-01
,clickhouse-keeper-02
,clickhouse-keeper-03
: ClickHouse Keeper instances.ch-proxy
: Proxy to route requests to the ClickHouse cluster.
-
Clone the repository:
git clone https://github.com/saratqpy/clickhouse-cluster.git
-
Start the containers using Docker Compose:
docker-compose up -d
-
Verify the setup by connecting to the cluster:
docker exec -it clickhouse-01 clickhouse-client
-
Open DBeaver or any SQL client and create a new SQL script:
CREATE DATABASE test ON CLUSTER cluster_1S_2R;
The cluster name
cluster_1S_2R
is defined in theconfig.xml
under theremote_servers
section. -
Create a replicated table across the cluster:
CREATE TABLE test.my_table ON CLUSTER cluster_1S_2R ( id UInt32, name String ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/my_table', '{replica}') ORDER BY id;
This uses the
ReplicatedMergeTree
engine to maintain a copy of the data across replicas. The table is namedmy_table
. -
Create a distributed table:
CREATE TABLE test.my_table_distributed ON CLUSTER cluster_1S_2R AS test.my_table ENGINE = Distributed(cluster_1S_2R, test, my_table, rand());
This will distribute data across shards, ensuring that if multiple shards exist, tables in the same shard hold the same data.
-
View cluster information:
SELECT * FROM system.clusters;
There are two ways to insert data into the tables:
-
Manual SQL Execution:
INSERT INTO test.my_table_distributed VALUES (1, 'abc');
Note: Direct inserts will throw an error if one of the nodes is down.
-
Using CH Proxy: You can use CH Proxy to automatically detect and avoid sending requests to nodes that are down. Here's an example of inserting data through the proxy:
curl -u admin_1S_2R:1 -X POST http://127.0.0.1:80/ -d "INSERT INTO test.my_table VALUES (1, 'abc')"
admin_1S_2R
: Username set in theconfig.yml
file for CH Proxy.1
: Password defined in the same file.- Port
80
: The configured port for CH Proxy.
To query data and see which node executed the query:
curl -u admin_1S_2R:1 -X POST http://127.0.0.1:80/ -d "SELECT hostName(), * FROM test.my_table LIMIT 1"
The hostName()
function returns the name of the node that executed the query.