Distributed ClickHouse Support
CH-UI provides comprehensive support for distributed ClickHouse deployments, enabling you to manage cluster operations directly from the interface.
Overview
Starting from version 1.5.30, CH-UI includes full support for distributed ClickHouse features:
- ON CLUSTER operations for tables and users
- Cluster-aware table creation
- Distributed table engine support
- Multi-node query execution
Enabling Distributed Mode
Via Settings Page
- Navigate to the Settings page in CH-UI
- Enable "Distributed Mode"
- Specify your cluster name
- Save settings
Features Enabled
When Distributed Mode is active:
- Table creation includes "ON CLUSTER" option
- User management operations support cluster-wide changes
- Distributed engine becomes available in table creation
- Cluster-specific metrics are displayed
Working with Clusters
Creating Tables on Cluster
When creating a table with Distributed Mode enabled:
- Check the "ON CLUSTER" option
- Select your cluster from the dropdown
- Choose the appropriate engine (ReplicatedMergeTree, Distributed, etc.)
Example DDL generated:
CREATE TABLE IF NOT EXISTS my_table ON CLUSTER my_cluster
(
id UInt64,
name String,
created DateTime
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/my_table', '{replica}')
ORDER BY id;
Creating Distributed Tables
For distributed tables across shards:
CREATE TABLE my_table_distributed ON CLUSTER my_cluster
AS my_table
ENGINE = Distributed(my_cluster, default, my_table, rand());
User Management on Cluster
Create users across all nodes:
CREATE USER 'new_user' ON CLUSTER my_cluster
IDENTIFIED BY 'password'
DEFAULT DATABASE default
SETTINGS max_memory_usage = 10000000000;
Grant permissions cluster-wide:
GRANT SELECT ON *.* TO new_user ON CLUSTER my_cluster;
Cluster Configuration Examples
Basic Two-Shard Setup
<clickhouse>
<remote_servers>
<my_cluster>
<shard>
<replica>
<host>clickhouse-01</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
</my_cluster>
</remote_servers>
</clickhouse>
Replicated Setup
<clickhouse>
<remote_servers>
<my_cluster>
<shard>
<replica>
<host>clickhouse-01</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>clickhouse-03</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-04</host>
<port>9000</port>
</replica>
</shard>
</my_cluster>
</remote_servers>
</clickhouse>
Best Practices
Table Design
- Use ReplicatedMergeTree for data redundancy
- Create Distributed tables for query routing
- Partition strategically to optimize query performance
Cluster Operations
- Always use ON CLUSTER for DDL operations
- Monitor replication lag between nodes
- Plan maintenance windows for cluster-wide changes
Performance Optimization
- Shard by high-cardinality columns
- Keep frequently joined data on same shard
- Use local tables for dimension data
Docker Compose Example
Complete distributed ClickHouse setup with CH-UI:
version: '3.8'
services:
clickhouse-01:
image: clickhouse/clickhouse-server
hostname: clickhouse-01
volumes:
- ./config/clickhouse-01:/etc/clickhouse-server/config.d
- clickhouse-01-data:/var/lib/clickhouse
networks:
- clickhouse-network
clickhouse-02:
image: clickhouse/clickhouse-server
hostname: clickhouse-02
volumes:
- ./config/clickhouse-02:/etc/clickhouse-server/config.d
- clickhouse-02-data:/var/lib/clickhouse
networks:
- clickhouse-network
ch-ui:
image: ghcr.io/caioricciuti/ch-ui:latest
ports:
- "5521:5521"
environment:
VITE_CLICKHOUSE_URL: "http://clickhouse-01:8123"
VITE_CLICKHOUSE_USER: "default"
VITE_CLICKHOUSE_PASS: ""
networks:
- clickhouse-network
depends_on:
- clickhouse-01
- clickhouse-02
networks:
clickhouse-network:
volumes:
clickhouse-01-data:
clickhouse-02-data:
Monitoring Distributed Systems
Cluster Health Queries
Check cluster status:
SELECT * FROM system.clusters WHERE cluster = 'my_cluster';
Monitor replication:
SELECT
database,
table,
is_leader,
total_replicas,
active_replicas
FROM system.replicas;
Performance Metrics
Track distributed query performance:
SELECT
query,
query_duration_ms,
read_rows,
read_bytes,
result_rows
FROM system.query_log
WHERE type = 'QueryFinish'
AND has(tables, 'my_table_distributed')
ORDER BY query_start_time DESC
LIMIT 10;
Troubleshooting
Common Issues
Tables Not Created on All Nodes
Problem: DDL executed without ON CLUSTER Solution: Always use ON CLUSTER for distributed operations
Replication Lag
Problem: Data inconsistency between replicas Solution: Check network connectivity and ZooKeeper health
Query Performance
Problem: Slow distributed queries Solution: Optimize sharding key and check data distribution
Debugging Commands
Check connectivity:
SELECT * FROM system.clusters;
Verify replication:
SELECT * FROM system.replication_queue;
Monitor mutations:
SELECT * FROM system.mutations WHERE is_done = 0;
Migration Guide
From Standalone to Distributed
- Backup existing data
- Setup cluster configuration
- Create replicated tables with ON CLUSTER
- Migrate data using INSERT SELECT
- Create distributed tables
- Update CH-UI settings for distributed mode
Example Migration
-- Create replicated table
CREATE TABLE my_table_replicated ON CLUSTER my_cluster
AS my_table_local
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/my_table', '{replica}')
ORDER BY id;
-- Copy data
INSERT INTO my_table_replicated SELECT * FROM my_table_local;
-- Create distributed table
CREATE TABLE my_table_distributed ON CLUSTER my_cluster
AS my_table_replicated
ENGINE = Distributed(my_cluster, default, my_table_replicated, rand());
Related Documentation
- Getting Started - Initial setup
- Environment Variables - Configuration options
- Permissions Guide - User access control
- Troubleshooting - Common issues