Models (SQL Transformations)
dbt-style SQL models with dependency resolution, DAG execution, and scheduling
Models let you define SQL transformations that materialize as ClickHouse views or tables. Dependencies between models are declared with $ref() syntax, resolved into a DAG, and executed in topological order.
Included in the free Community edition.
Overview
Each model has:
- A name (valid ClickHouse identifier, unique per connection)
- A SQL body (with optional
$ref()references to other models) - A materialization (
viewortable) - A target database (where the view/table is created)
Models are scoped to a ClickHouse connection. Execution runs through the tunnel gateway using credentials from an active user session.
Creating a Model
Navigate to Models in the sidebar and click New Model.
| Field | Description | Default |
|---|---|---|
| Name | Valid ClickHouse identifier (^[a-zA-Z_][a-zA-Z0-9_]*$) | Required |
| Description | Free-text description | Empty |
| Target Database | ClickHouse database for the output | default |
| Materialization | view or table | view |
| SQL Body | The SELECT query (may contain $ref() calls) | Required |
| Table Engine | ClickHouse engine (only for table) | MergeTree |
| ORDER BY | ClickHouse ORDER BY clause (only for table) | tuple() |
Materialization Types
| Type | DDL Generated |
|---|---|
view | CREATE OR REPLACE VIEW db.name AS {your SQL} |
table | DROP TABLE IF EXISTS db.name then CREATE TABLE db.name ENGINE = engine ORDER BY order_by AS {your SQL} |
Tables are recreated on every run (drop + create). Views are replaced in-place.
$ref() Syntax
Reference other models by name using $ref(). At execution time, references are resolved to fully-qualified table names.
SELECT
a.user_id,
a.total_orders,
b.lifetime_value
FROM $ref(user_orders) a
JOIN $ref(user_revenue) b ON a.user_id = b.user_idResolves to:
SELECT
a.user_id,
a.total_orders,
b.lifetime_value
FROM `analytics`.`user_orders` a
JOIN `analytics`.`user_revenue` b ON a.user_id = b.user_idResolution Rules
$ref(name)becomes`target_database`.`name`(backtick-escaped)- References inside SQL comments (
--and/* */) are ignored — comments are stripped before extraction - Each unique reference creates a dependency edge in the DAG
- Unresolved references (referencing a model that doesn't exist) cause a validation error
DAG & Dependencies
Models form a Directed Acyclic Graph (DAG). CH-UI uses Kahn's algorithm for topological sorting and cycle detection.
How It Works
- Extract
$ref()calls from each model's SQL body - Build a dependency graph (model A depends on model B if A references B)
- Topologically sort using Kahn's algorithm (process models with zero in-degree first)
- If fewer models are processed than exist in the graph, a cycle is detected
Validation Checks
| Check | Error |
|---|---|
| Unknown reference | references unknown model 'foo' via $ref() |
| Self-reference | cannot reference itself via $ref(model_name) |
| Circular dependency | cycle detected in model dependencies |
Run validation anytime from the UI or via GET /api/models/validate.
Execution
Three execution modes are available. All acquire a per-connection lock to prevent concurrent runs.
Run All
Executes every model in the connection in topological order.
POST /api/models/runRun Single
Executes one model plus all its transitive upstream dependencies (recursive walk).
POST /api/models/{id}/runUse this when working on a specific model — its dependencies are automatically rebuilt first.
Run Pipeline
Executes all models in the same connected component (see Pipelines below).
POST /api/models/pipelines/{anchorId}/runExecution Steps (Per Model)
- Check upstream failures — if any dependency failed, skip this model
- Resolve
$ref()calls — replace with fully-qualified table names - Generate DDL —
CREATE OR REPLACE VIEWorDROP TABLE + CREATE TABLE ... AS - Execute via tunnel — 5-minute timeout per statement
- Record result — status, elapsed time, error (if any)
Cascade Failures
If an upstream model fails, all downstream models are marked as skipped. The run status becomes partial (mix of success and failure) or error (all failed/skipped).
Pipelines
A pipeline is a group of models that are transitively connected — they directly or indirectly depend on each other. Models with no connections to each other form separate pipelines.
How Pipelines Are Detected
CH-UI treats the dependency graph as undirected and finds connected components via BFS. Each component is an independent pipeline.
Anchor Model
Each pipeline is identified by its anchor model — the first model in topological order within the component. The anchor ID is used for scheduling and pipeline execution.
Example
Pipeline 1: raw_events → cleaned_events → daily_aggregates
Pipeline 2: user_signups → user_cohorts (independent, separate schedule)Scheduling
Each pipeline can have its own cron schedule. The scheduler checks for due schedules every 30 seconds.
Creating a Schedule
Navigate to the pipeline view and click Schedule on a pipeline card.
| Field | Description |
|---|---|
| Cron | 5-field cron expression (minute hour dom month dow) |
| Enabled | Toggle to pause/resume |
Cron Syntax
Standard 5-field format:
| Field | Range | Special |
|---|---|---|
| Minute | 0-59 | *, ,, -, / |
| Hour | 0-23 | *, ,, -, / |
| Day of month | 1-31 | *, ,, -, / |
| Month | 1-12 | *, ,, -, / |
| Day of week | 0-6 (Sun=0) | *, ,, -, / |
Examples:
0 9 * * * # Daily at 9:00 AM
0 */6 * * * # Every 6 hours
30 2 * * 0 # Sundays at 2:30 AM
0 0 1 * * # First of month at midnightExecution
When a schedule fires, the scheduler calls RunPipeline for the anchor model's connected component. The triggered_by field is set to "scheduler".
Credentials are borrowed from an active user session (up to 3 recent sessions are tried). If no valid session exists, the run fails.
DAG Visualization
The DAG View renders models as an interactive graph using XyFlow.
- Upstream models appear on the left, downstream on the right
- Nodes show model name, materialization type, status, and target database
- Edges show dependency direction
- Layout:
x = depth * 300px,y = index_at_depth * 120px
Run History
Every execution creates a run record with per-model results.
Run Status
| Status | Meaning |
|---|---|
running | Execution in progress |
success | All models succeeded |
partial | Some models succeeded, some failed |
error | All models failed or were skipped |
Per-Model Result Status
| Status | Meaning |
|---|---|
pending | Not yet started |
running | Currently executing |
success | Completed without errors |
error | Execution failed (error in error field) |
skipped | Skipped due to upstream dependency failure |
Results include the resolved SQL (with $ref() replaced), elapsed time in milliseconds, and any error message.
Viewing History
Navigate to Models > Runs to see recent runs with their status, model counts, trigger source, and timestamps. Click a run to see per-model results.
Concurrency
- Only one run per connection at a time (mutex lock)
- Attempting a second run returns:
a model run is already in progress for this connection - Each DDL statement has a 5-minute execution timeout