Complexity Detection

Name: Complexity Detection
Author: juliusbrussee

juliusbrussee/cavekit

Score a task on five axes so Cavekit can right-size budgets, model choice, and review depth before sketch, map, or make.

Overview

complexity-detection is a journey-wide agent skill that scores tasks on five 0–4 axes (0–20 total) so Cavekit can right-size depth, models, and review.

Install

npx skills add https://github.com/juliusbrussee/cavekit --skill complexity-detection

What is this skill?

Deterministic 5-axis scoring rubric summed 0–20 (files, type, judgment, cross-component, novelty)
Maps totals to quick, standard, or thorough depth for downstream Cavekit commands
Shared by /ck:sketch (kit default), /ck:map (per-task depth), and /ck:make (per-task budgets)
Dedicated ck:complexity agent path documented with haiku model routing
Trigger phrases: how complex, what depth, pick a depth, classify this task
5 scoring axes from 0–4 each
0–20 total complexity sum
3 depth bands: quick, standard, thorough

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 5 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You cannot tell whether a task is a quick chore or a cross-repo architectural change, so budgets and model routing stay misaligned.

Who is it for?

Builders using the Cavekit /ck commands who want deterministic depth before sketching kits, mapping tasks, or executing make.

Skip if: Teams not on Cavekit who need human story-point estimation without the five-axis rubric.

When should I use this skill?

Trigger phrases: "how complex", "what depth", "pick a depth", "classify this task"; used by /ck:sketch, /ck:map, /ck:make, and ck:complexity.

What do I get? / Deliverables

You get a quick, standard, or thorough classification that /ck:sketch, /ck:map, /ck:make, and ck:complexity use for consistent pipeline sizing.

Quick, standard, or thorough depth label
Per-axis 0–4 scores and 0–20 total

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildProject management & tracking

Default kit complexity in /ck:sketch before generating task maps for a new feature.

Example use

ShipCode review

Choose thorough review depth when judgment axis scores critical security or production risk.

Example use

OperateIteration & experiments

Classify a hotfix as mechanical vs cross-cutting before assigning agent budget for the patch.

Example use

ValidateScope & plan

Decide whether a prototype spike is quick or research-heavy before committing build time.

How it compares

Use instead of gut-feel 'small/medium/large' labels when routing agent work through Cavekit.

Common Questions / FAQ

Who is complexity-detection for?

Solo builders and agent operators using Julius Brussee's Cavekit pipeline who need shared depth labels before sketch, map, or make.

When should I use complexity-detection?

At the start of a Build task, before Ship review depth is chosen, or during Operate when classifying incident-fix scope—anytime you ask how complex, what depth, pick a depth, or classify this task.

Is complexity-detection safe to install?

It is a scoring rubric with no external calls by itself; review the Security Audits panel on this page for the parent Cavekit repo permissions.

SKILL.md

READMESKILL.md - Complexity Detection

# Complexity Detection

A deterministic scoring rubric. Five axes, 0–4 each, summed to 0–20.

## Axes

| Axis                  | 0                | 1                 | 2                 | 3                    | 4                   |
|-----------------------|------------------|-------------------|-------------------|----------------------|---------------------|
| **Files touched**     | 0–2              | 3–5               | 6–10              | 11–20                | 20+                 |
| **Type**              | chore / format   | refactor          | feature           | cross-cutting        | architectural       |
| **Judgment required** | mechanical       | low-ambiguity     | medium            | high                 | critical (sec/prod) |
| **Cross-component**   | single module    | two modules       | three modules     | many within one repo | multi-repo          |
| **Novelty**           | known pattern    | rare pattern      | novel             | research needed      | unknown unknowns    |

Total score maps to:

| Score  | Depth      |
|--------|------------|
| 0 – 6  | quick      |
| 7 – 13 | standard   |
| 14+    | thorough   |

## Override signals

Upgrade one step regardless of score when any of these are true:

- Security-sensitive: authentication, authorization, crypto, secrets, PII.
- Data migration that is not reversible.
- Public API shape change (breaking).
- Performance-critical hot path with an existing SLA.

Downgrade one step only when **all** of these are true:

- Zero new dependencies.
- Existing tests cover the change.
- No user-visible behaviour change.
- Single file, single function.

## Per-depth defaults

| Depth    | Token budget | Model tier | Review    | Tests                       |
|----------|--------------|------------|-----------|-----------------------------|
| quick    | 8 000        | haiku      | optional  | smoke                       |
| standard | 20 000       | sonnet     | required  | unit + integration          |
| thorough | 45 000       | sonnet/opus| mandatory | unit + integration + E2E    |

These defaults are recorded in `.cavekit/config.json` under `task_budgets`
and consumed by the `cavekit-router.cjs` model router.

## How agents score

The `ck:complexity` subagent (haiku) receives a task description and returns
a JSON blob:

```json
{
  "score": 11,
  "depth": "standard",
  "axes": {
    "files": 2, "type": 2, "judgment": 2, "cross_component": 2, "novelty": 3
  },
  "overrides_applied": []
}
```

`/ck:map` calls this agent per task to set `depth` in the task registry. If
the agent produces a score in the "thorough" band with a novelty of 4 and a
security override, it may return `needs_research: true`, which `/ck:map` must
translate into an upstream `ck:researcher` task dependency before the work
itself.

## Integration points

- `/ck:sketch` — runs complexity scoring on the whole domain to set the kit's
  `complexity:` frontmatter.
- `/ck:map` — runs it per task to assign `depth`.
- `/ck:make` — reads `depth` to size the task budget and pick the review
  intensity.
- `ck:complexity` agent — pure-haiku worker; does nothing else.

## Anti-patterns

- Using one depth for every task in a kit "for consistency." Cost up, signal
  down.
- Padding depth to "be safe" — if the budget is oversized, the model wastes
  tokens exploring. Right-size, then raise only when verification fails.
- Ignoring overrides — scoring a login flow as "quick" because it touches one
  file. Security overrides exist for exact

What is this skill?

Deterministic 5-axis scoring rubric summed 0–20 (files, type, judgment, cross-component, novelty)

Maps totals to quick, standard, or thorough depth for downstream Cavekit commands

Shared by /ck:sketch (kit default), /ck:map (per-task depth), and /ck:make (per-task budgets)

Dedicated ck:complexity agent path documented with haiku model routing

Trigger phrases: how complex, what depth, pick a depth, classify this task

5 scoring axes from 0–4 each

0–20 total complexity sum

3 depth bands: quick, standard, thorough

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 5 installs on skills.sh; 1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

BuildProject management & tracking

Default kit complexity in /ck:sketch before generating task maps for a new feature.

Example use

ShipCode review

Choose thorough review depth when judgment axis scores critical security or production risk.

Example use

OperateIteration & experiments

Classify a hotfix as mechanical vs cross-cutting before assigning agent budget for the patch.

Example use

ValidateScope & plan

Decide whether a prototype spike is quick or research-heavy before committing build time.

SKILL.md

READMESKILL.md - Complexity Detection

# Complexity Detection

A deterministic scoring rubric. Five axes, 0–4 each, summed to 0–20.

## Axes

| Axis                  | 0                | 1                 | 2                 | 3                    | 4                   |
|-----------------------|------------------|-------------------|-------------------|----------------------|---------------------|
| **Files touched**     | 0–2              | 3–5               | 6–10              | 11–20                | 20+                 |
| **Type**              | chore / format   | refactor          | feature           | cross-cutting        | architectural       |
| **Judgment required** | mechanical       | low-ambiguity     | medium            | high                 | critical (sec/prod) |
| **Cross-component**   | single module    | two modules       | three modules     | many within one repo | multi-repo          |
| **Novelty**           | known pattern    | rare pattern      | novel             | research needed      | unknown unknowns    |

Total score maps to:

| Score  | Depth      |
|--------|------------|
| 0 – 6  | quick      |
| 7 – 13 | standard   |
| 14+    | thorough   |

## Override signals

Upgrade one step regardless of score when any of these are true:

- Security-sensitive: authentication, authorization, crypto, secrets, PII.
- Data migration that is not reversible.
- Public API shape change (breaking).
- Performance-critical hot path with an existing SLA.

Downgrade one step only when **all** of these are true:

- Zero new dependencies.
- Existing tests cover the change.
- No user-visible behaviour change.
- Single file, single function.

## Per-depth defaults

| Depth    | Token budget | Model tier | Review    | Tests                       |
|----------|--------------|------------|-----------|-----------------------------|
| quick    | 8 000        | haiku      | optional  | smoke                       |
| standard | 20 000       | sonnet     | required  | unit + integration          |
| thorough | 45 000       | sonnet/opus| mandatory | unit + integration + E2E    |

These defaults are recorded in `.cavekit/config.json` under `task_budgets`
and consumed by the `cavekit-router.cjs` model router.

## How agents score

The `ck:complexity` subagent (haiku) receives a task description and returns
a JSON blob:

```json
{
  "score": 11,
  "depth": "standard",
  "axes": {
    "files": 2, "type": 2, "judgment": 2, "cross_component": 2, "novelty": 3
  },
  "overrides_applied": []
}
```

`/ck:map` calls this agent per task to set `depth` in the task registry. If
the agent produces a score in the "thorough" band with a novelty of 4 and a
security override, it may return `needs_research: true`, which `/ck:map` must
translate into an upstream `ck:researcher` task dependency before the work
itself.

## Integration points

- `/ck:sketch` — runs complexity scoring on the whole domain to set the kit's
  `complexity:` frontmatter.
- `/ck:map` — runs it per task to assign `depth`.
- `/ck:make` — reads `depth` to size the task budget and pick the review
  intensity.
- `ck:complexity` agent — pure-haiku worker; does nothing else.

## Anti-patterns

- Using one depth for every task in a kit "for consistency." Cost up, signal
  down.
- Padding depth to "be safe" — if the budget is oversized, the model wastes
  tokens exploring. Right-size, then raise only when verification fails.
- Ignoring overrides — scoring a login flow as "quick" because it touches one
  file. Security overrides exist for exact

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is complexity-detection for?

When should I use complexity-detection?

Is complexity-detection safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is complexity-detection for?

When should I use complexity-detection?

Is complexity-detection safe to install?

SKILL.md