diff --git a/docs/developers/configuration.md b/docs/developers/configuration.md new file mode 100644 index 00000000000..3e5ee65e880 --- /dev/null +++ b/docs/developers/configuration.md @@ -0,0 +1,173 @@ +--- +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + https://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Configuration +Celeborn configuration is separated into static and dynamic configuration. Static configuration `CelebornConf` is specified in +default configuration file `$CELEBORN_HOME/conf/celeborn-defaults.conf`. Meanwhile, the dynamic configuration overrides +the static configuration at runtime of `Master` and `Worker`. + +## Dynamic Configuration +Dynamic configuration is a type of configuration that can be changed dynamically at runtime as needed, which supports various +config levels including: + +- `SYSTEM`: The system configurations. +- `TENANT`: The dynamic configurations of tenant id. +- `TENANT_USER`: The dynamic configurations of tenant id and username. + +When applying dynamic configuration, the priority order of config levels is as follows: + +- `SYSTEM` level configuration overrides the static configuration, which in turn overrides the configuration of `CelebornConf`. +When the system level configuration is missing, fallback to the static configuration of `CelebornConf`. +- `TENANT` level configuration overrides `SYSTEM` level, which result in that a configuration of certain tenant id overrides the configuration at system level. +When the tenant level configuration is missing, fallback to the configuration at system level. +- `TENANT_USER` level configuration overrides `TENANT` level, which result in a configuration of certain tenant id and username overrides the configuration at tenant level. +When the tenant user level configuration is missing, fallback to the configuration at tenant level. + +## Config Service +Config service provides the configuration management service with local cache for the static and dynamic configuration. +Meanwhile, `ConfigService` is a pluggable service interface of which implementation is based on various store backend. +The store backend of `ConfigService` is configured with `celeborn.dynamicConfig.store.backend`, which supports filesystem +and database store backend at present. If the store backend is not provided, it means that config service is disabled. + +### FileSystem Config Service +Filesystem config service supports the configuration specified in the dynamic configuration file. The dynamic configuration +file is configured with `celeborn.quota.configuration.path`, of which default path is `$CELEBORN_HOME/conf/dynamicConfig.yaml`. +The template of the dynamic configuration is as follows: + +```yaml +- level: SYSTEM + config: + celeborn.client.push.buffer.initial.size: 100k + celeborn.client.push.buffer.max.size: 1000k + celeborn.worker.fetch.heartbeat.enabled: true + celeborn.client.push.buffer.initial.size.only: 10k + celeborn.test.timeoutMs.only: 100s + celeborn.test.enabled.only: false + celeborn.test.int.only: 10 + +- tenantId: tenant_id + level: TENANT + config: + celeborn.client.push.buffer.initial.size: 10k + celeborn.client.push.buffer.initial.size.only: 100k + celeborn.worker.fetch.heartbeat.enabled: false + celeborn.test.tenant.timeoutMs.only: 100s + celeborn.test.tenant.enabled.only: false + celeborn.test.tenant.int.only: 10 + +- tenantId: tenant_id1 + level: TENANT + config: + celeborn.client.push.buffer.initial.size: 10k + celeborn.client.push.buffer.initial.size.only: 100k + celeborn.worker.fetch.heartbeat.enabled: false + celeborn.test.tenant.timeoutMs.only: 100s + celeborn.test.tenant.enabled.only: false + celeborn.test.tenant.int.only: 10 + celeborn.client.push.queue.capacity: 1024 + users: + - name: Jerry + config: + celeborn.client.push.buffer.initial.size: 1k + celeborn.client.push.buffer.initial.size.user.only: 512k +``` + +### Database Config Service +Database config service refreshes the changed dynamic configuration stored in database via JDBC. The configuration of +database store backend is specified with `celeborn.dynamicConfig.store.db.*`. The database of store backend needs to create +the tables of the dynamic configuration for different config levels. For example, the template of configuration tables for +MySQL is as follows: + +```sql +CREATE TABLE IF NOT EXISTS celeborn_cluster_info +( + id int NOT NULL AUTO_INCREMENT, + name varchar(255) NOT NULL COMMENT 'celeborn cluster name', + namespace varchar(255) DEFAULT NULL COMMENT 'celeborn cluster namespace', + endpoint varchar(255) DEFAULT NULL COMMENT 'celeborn cluster endpoint', + gmt_create timestamp NOT NULL, + gmt_modify timestamp NOT NULL, + PRIMARY KEY (id), + UNIQUE KEY `index_cluster_unique_name` (`name`) +); + +CREATE TABLE IF NOT EXISTS celeborn_cluster_system_config +( + id int NOT NULL AUTO_INCREMENT, + cluster_id int NOT NULL, + config_key varchar(255) NOT NULL, + config_value varchar(255) NOT NULL, + type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota', + gmt_create timestamp NOT NULL, + gmt_modify timestamp NOT NULL, + PRIMARY KEY (id), + UNIQUE KEY `index_unique_system_config_key` (`cluster_id`, `config_key`) +); + +CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config +( + id int NOT NULL AUTO_INCREMENT, + cluster_id int NOT NULL, + tenant_id varchar(255) NOT NULL, + level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER', + user varchar(255) DEFAULT NULL COMMENT 'tenant sub user', + config_key varchar(255) NOT NULL, + config_value varchar(255) NOT NULL, + type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota', + gmt_create timestamp NOT NULL, + gmt_modify timestamp NOT NULL, + PRIMARY KEY (id), + UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `user`, `config_key`) +); +``` + +After the creation of configuration tables, the dynamic configuration of different config levels is specified via inserting +a configuration record in corresponding config level table. + +```sql +INSERT INTO celeborn_cluster_info ( `id`, `name`, `namespace`, `endpoint`, `gmt_create`, `gmt_modify` ) +VALUES + ( 1, 'default', 'celeborn-1', 'celeborn-namespace.endpoint.com', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ); +INSERT INTO `celeborn_cluster_system_config` ( `id`, `cluster_id`, `config_key`, `config_value`, `type`, `gmt_create`, `gmt_modify` ) +VALUES + ( 1, 1, 'celeborn.client.push.buffer.initial.size', '102400', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 2, 1, 'celeborn.client.push.buffer.max.size', '1024000', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 3, 1, 'celeborn.worker.fetch.heartbeat.enabled', 'true', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 5, 1, 'celeborn.client.push.buffer.initial.size.only', '10240', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 6, 1, 'celeborn.test.timeoutMs.only', '100s', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 7, 1, 'celeborn.test.enabled.only', 'false', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 8, 1, 'celeborn.test.int.only', '10', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ); +INSERT INTO `celeborn_cluster_tenant_config` ( `id`, `cluster_id`, `tenant_id`, `level`, `name`, `config_key`, `config_value`, `type`, `gmt_create`, `gmt_modify` ) +VALUES + ( 1, 1, 'tenant_id', 'TENANT', '', 'celeborn.client.push.buffer.initial.size', '10240', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 2, 1, 'tenant_id', 'TENANT', '', 'celeborn.client.push.buffer.initial.size.only', '102400', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 3, 1, 'tenant_id', 'TENANT', '', 'celeborn.worker.fetch.heartbeat.enabled', 'false', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 4, 1, 'tenant_id', 'TENANT', '', 'celeborn.test.tenant.timeoutMs.only', '100s', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 5, 1, 'tenant_id', 'TENANT', '', 'celeborn.test.tenant.enabled.only', 'false', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 6, 1, 'tenant_id', 'TENANT', '', 'celeborn.test.tenant.int.only', '100s', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 7, 1, 'tenant_id', 'TENANT', '', 'celeborn.client.push.queue.capacity', '1024', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 8, 1, 'tenant_id1', 'TENANT', '', 'celeborn.client.push.buffer.initial.size', '10240', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 9, 1, 'tenant_id1', 'TENANT', '', 'celeborn.client.push.buffer.initial.size.only', '102400', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 10, 1, 'tenant_id1', 'TENANT', '', 'celeborn.worker.fetch.heartbeat.enabled', 'false', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 11, 1, 'tenant_id1', 'TENANT', '', 'celeborn.test.tenant.timeoutMs.only', '100s', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 12, 1, 'tenant_id1', 'TENANT', '', 'celeborn.test.tenant.enabled.only', 'false', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 13, 1, 'tenant_id1', 'TENANT', '', 'celeborn.test.tenant.int.only', '100s', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 14, 1, 'tenant_id1', 'TENANT', '', 'celeborn.client.push.queue.capacity', '1024', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 15, 1, 'tenant_id1', 'TENANT_USER', 'Jerry', 'celeborn.client.push.buffer.initial.size', '1k', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ), + ( 16, 1, 'tenant_id1', 'TENANT_USER', 'Jerry', 'celeborn.client.push.buffer.initial.size.user.only', '512k', 'QUOTA', '2023-08-26 22:08:30', '2023-08-26 22:08:30' ); +``` diff --git a/mkdocs.yml b/mkdocs.yml index 00f8ed10276..5cb056a246d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -93,6 +93,7 @@ nav: - Overview: developers/client.md - LifecycleManager: developers/lifecyclemanager.md - ShuffleClient: developers/shuffleclient.md + - Configuration: developers/configuration.md - Fault Tolerant: developers/faulttolerant.md - Worker Exclusion: developers/workerexclusion.md - Integrating Celeborn: developers/integrate.md