Clickhouse Io

Canonical shelf is Build backend because the skill’s core value is OLAP schema design, query optimization, and ingestion—not distribution or ops runbooks alone. Backend subphase matches table engines, partitions, ORDER BY keys, and analytical SQL that power product-facing metrics APIs.

Also useful

Also useful

Where it fits

Example use

Define a partitioned markets_analytics table with sane ORDER BY before exposing metrics endpoints.

Example use

Rewrite a slow dashboard query using projections or pruning guidance instead of scaling Postgres vertically.

Example use

Plan Kafka or batch ingest for product events feeding funnel and retention reports.

How it compares

Use for ClickHouse OLAP design guidance in SKILL.md form, not as a managed warehouse connector or generic SQL linter.

Common Questions / FAQ

Who is clickhouse-io for?

Developers and solo founders shipping SaaS or data products who need ClickHouse table design, ingest, and analytical SQL help inside their coding agent.

When should I use clickhouse-io?

During Build backend work when designing fact tables and ingest, during Ship perf when tuning heavy aggregations, and during Grow analytics when standing up real-time or time-series dashboards.

Is clickhouse-io safe to install?

It is documentation-style patterns without mandatory network calls; review the Security Audits panel on this page and treat any example DDL as something to run only against databases you control.

SKILL.md

READMESKILL.md - Clickhouse Io

# ClickHouse Analytics Patterns

ClickHouse-specific patterns for high-performance analytics and data engineering.

## When to Activate

- Designing ClickHouse table schemas (MergeTree engine selection)
- Writing analytical queries (aggregations, window functions, joins)
- Optimizing query performance (partition pruning, projections, materialized views)
- Ingesting large volumes of data (batch inserts, Kafka integration)
- Migrating from PostgreSQL/MySQL to ClickHouse for analytics
- Implementing real-time dashboards or time-series analytics

## Overview

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP). It's optimized for fast analytical queries on large datasets.

**Key Features:**
- Column-oriented storage
- Data compression
- Parallel query execution
- Distributed queries
- Real-time analytics

## Table Design Patterns

### MergeTree Engine (Most Common)

```sql
CREATE TABLE markets_analytics (
    date Date,
    market_id String,
    market_name String,
    volume UInt64,
    trades UInt32,
    unique_traders UInt32,
    avg_trade_size Float64,
    created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, market_id)
SETTINGS index_granularity = 8192;
```

### ReplacingMergeTree (Deduplication)

```sql
-- For data that may have duplicates (e.g., from multiple sources)
CREATE TABLE user_events (
    event_id String,
    user_id String,
    event_type String,
    timestamp DateTime,
    properties String
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, event_id, timestamp)
PRIMARY KEY (user_id, event_id);
```

### AggregatingMergeTree (Pre-aggregation)

```sql
-- For maintaining aggregated metrics
CREATE TABLE market_stats_hourly (
    hour DateTime,
    market_id String,
    total_volume AggregateFunction(sum, UInt64),
    total_trades AggregateFunction(count, UInt32),
    unique_users AggregateFunction(uniq, String)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, market_id);

-- Query aggregated data
SELECT
    hour,
    market_id,
    sumMerge(total_volume) AS volume,
    countMerge(total_trades) AS trades,
    uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR)
GROUP BY hour, market_id
ORDER BY hour DESC;
```

## Query Optimization Patterns

### Efficient Filtering

```sql
-- PASS: GOOD: Use indexed columns first
SELECT *
FROM markets_analytics
WHERE date >= '2025-01-01'
  AND market_id = 'market-123'
  AND volume > 1000
ORDER BY date DESC
LIMIT 100;

-- FAIL: BAD: Filter on non-indexed columns first
SELECT *
FROM markets_analytics
WHERE volume > 1000
  AND market_name LIKE '%election%'
  AND date >= '2025-01-01';
```

### Aggregations

```sql
-- PASS: GOOD: Use ClickHouse-specific aggregation functions
SELECT
    toStartOfDay(created_at) AS day,
    market_id,
    sum(volume) AS total_volume,
    count() AS total_trades,
    uniq(trader_id) AS unique_traders,
    avg(trade_size) AS avg_size
FROM trades
WHERE created_at >= today() - INTERVAL 7 DAY
GROUP BY day, market_id
ORDER BY day DESC, total_volume DESC;

-- PASS: Use quantile for percentiles (more efficient than percentile)
SELECT
    quantile(0.50)(trade_size) AS median,
    quantile(0.95)(trade_size) AS p95,
    quantile(0.99)(trade_size) AS p99
FROM trades
WHERE created_at >= now() - INTERVAL 1 HOUR;
```

### Window Functions

```sql
-- Calculate running totals
SELECT
    date,
    market_id,
    volume,
    sum(volume) OVER (
        PARTITION BY market_id
        ORDER BY date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_volume
FROM markets_analytics
WHERE date >= today() - INTERVAL 30 DAY
ORDER BY market_id, date;
```

## Data Inse

What is this skill?

MergeTree, ReplacingMergeTree, and related engine selection patterns with PARTITION BY and ORDER BY examples

When-to-activate list covers aggregations, window functions, joins, partition pruning, projections, and materialized vie

Migration and workload guidance from PostgreSQL/MySQL toward ClickHouse for analytics

Batch insert and Kafka-oriented ingestion patterns for high-volume event streams

Real-time dashboard and time-series analytics orientation for columnar OLAP workloads

Example settings include index_granularity 8192 in MergeTree DDL

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 5.2k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Define a partitioned markets_analytics table with sane ORDER BY before exposing metrics endpoints.

Example use

Rewrite a slow dashboard query using projections or pruning guidance instead of scaling Postgres vertically.

Example use