
Clickhouse Io
Design MergeTree schemas, fast aggregations, and ingestion patterns when your solo SaaS outgrows Postgres for dashboards and event analytics.
Overview
clickhouse-io is an agent skill most often used in Build (also Grow analytics) that applies ClickHouse OLAP schema and query optimization patterns for high-performance analytics.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill clickhouse-ioWhat is this skill?
- MergeTree, ReplacingMergeTree, and related engine selection patterns with PARTITION BY and ORDER BY examples
- When-to-activate list covers aggregations, window functions, joins, partition pruning, projections, and materialized vie
- Migration and workload guidance from PostgreSQL/MySQL toward ClickHouse for analytics
- Batch insert and Kafka-oriented ingestion patterns for high-volume event streams
- Real-time dashboard and time-series analytics orientation for columnar OLAP workloads
- Example settings include index_granularity 8192 in MergeTree DDL
Adoption & trust: 5.2k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your analytics queries on large event datasets are too slow in a row-oriented database and you lack a concise playbook for ClickHouse engines, partitions, and ingest.
Who is it for?
Solo builders adding an analytics plane, event warehouse, or internal metrics API who can operate ClickHouse or ClickHouse Cloud.
Skip if: Simple CRUD apps with tiny tables, or teams that need transactional OLTP semantics ClickHouse does not provide.
When should I use this skill?
Designing ClickHouse schemas, optimizing analytical queries, ingesting large volumes, migrating analytics off Postgres/MySQL, or building real-time dashboards.
What do I get? / Deliverables
You get concrete DDL, query, and optimization patterns aligned to ClickHouse so dashboards and batch metrics stay fast as data volume grows.
- MergeTree-oriented DDL snippets
- Query and optimization recommendations
- Ingest and migration approach notes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build backend because the skill’s core value is OLAP schema design, query optimization, and ingestion—not distribution or ops runbooks alone. Backend subphase matches table engines, partitions, ORDER BY keys, and analytical SQL that power product-facing metrics APIs.
Where it fits
Define a partitioned markets_analytics table with sane ORDER BY before exposing metrics endpoints.
Rewrite a slow dashboard query using projections or pruning guidance instead of scaling Postgres vertically.
Plan Kafka or batch ingest for product events feeding funnel and retention reports.
How it compares
Use for ClickHouse OLAP design guidance in SKILL.md form, not as a managed warehouse connector or generic SQL linter.
Common Questions / FAQ
Who is clickhouse-io for?
Developers and solo founders shipping SaaS or data products who need ClickHouse table design, ingest, and analytical SQL help inside their coding agent.
When should I use clickhouse-io?
During Build backend work when designing fact tables and ingest, during Ship perf when tuning heavy aggregations, and during Grow analytics when standing up real-time or time-series dashboards.
Is clickhouse-io safe to install?
It is documentation-style patterns without mandatory network calls; review the Security Audits panel on this page and treat any example DDL as something to run only against databases you control.
SKILL.md
READMESKILL.md - Clickhouse Io
# ClickHouse Analytics Patterns ClickHouse-specific patterns for high-performance analytics and data engineering. ## When to Activate - Designing ClickHouse table schemas (MergeTree engine selection) - Writing analytical queries (aggregations, window functions, joins) - Optimizing query performance (partition pruning, projections, materialized views) - Ingesting large volumes of data (batch inserts, Kafka integration) - Migrating from PostgreSQL/MySQL to ClickHouse for analytics - Implementing real-time dashboards or time-series analytics ## Overview ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP). It's optimized for fast analytical queries on large datasets. **Key Features:** - Column-oriented storage - Data compression - Parallel query execution - Distributed queries - Real-time analytics ## Table Design Patterns ### MergeTree Engine (Most Common) ```sql CREATE TABLE markets_analytics ( date Date, market_id String, market_name String, volume UInt64, trades UInt32, unique_traders UInt32, avg_trade_size Float64, created_at DateTime ) ENGINE = MergeTree() PARTITION BY toYYYYMM(date) ORDER BY (date, market_id) SETTINGS index_granularity = 8192; ``` ### ReplacingMergeTree (Deduplication) ```sql -- For data that may have duplicates (e.g., from multiple sources) CREATE TABLE user_events ( event_id String, user_id String, event_type String, timestamp DateTime, properties String ) ENGINE = ReplacingMergeTree() PARTITION BY toYYYYMM(timestamp) ORDER BY (user_id, event_id, timestamp) PRIMARY KEY (user_id, event_id); ``` ### AggregatingMergeTree (Pre-aggregation) ```sql -- For maintaining aggregated metrics CREATE TABLE market_stats_hourly ( hour DateTime, market_id String, total_volume AggregateFunction(sum, UInt64), total_trades AggregateFunction(count, UInt32), unique_users AggregateFunction(uniq, String) ) ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(hour) ORDER BY (hour, market_id); -- Query aggregated data SELECT hour, market_id, sumMerge(total_volume) AS volume, countMerge(total_trades) AS trades, uniqMerge(unique_users) AS users FROM market_stats_hourly WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR) GROUP BY hour, market_id ORDER BY hour DESC; ``` ## Query Optimization Patterns ### Efficient Filtering ```sql -- PASS: GOOD: Use indexed columns first SELECT * FROM markets_analytics WHERE date >= '2025-01-01' AND market_id = 'market-123' AND volume > 1000 ORDER BY date DESC LIMIT 100; -- FAIL: BAD: Filter on non-indexed columns first SELECT * FROM markets_analytics WHERE volume > 1000 AND market_name LIKE '%election%' AND date >= '2025-01-01'; ``` ### Aggregations ```sql -- PASS: GOOD: Use ClickHouse-specific aggregation functions SELECT toStartOfDay(created_at) AS day, market_id, sum(volume) AS total_volume, count() AS total_trades, uniq(trader_id) AS unique_traders, avg(trade_size) AS avg_size FROM trades WHERE created_at >= today() - INTERVAL 7 DAY GROUP BY day, market_id ORDER BY day DESC, total_volume DESC; -- PASS: Use quantile for percentiles (more efficient than percentile) SELECT quantile(0.50)(trade_size) AS median, quantile(0.95)(trade_size) AS p95, quantile(0.99)(trade_size) AS p99 FROM trades WHERE created_at >= now() - INTERVAL 1 HOUR; ``` ### Window Functions ```sql -- Calculate running totals SELECT date, market_id, volume, sum(volume) OVER ( PARTITION BY market_id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_volume FROM markets_analytics WHERE date >= today() - INTERVAL 30 DAY ORDER BY market_id, date; ``` ## Data Inse