
Diagnose
Run a disciplined reproduce-minimise-hypothesise-instrument-fix-regression loop when a solo builder is stuck on a nasty bug or performance regression.
Overview
Diagnose is an agent skill most often used in Ship (also Build, Operate) that runs a disciplined bug and performance-regression loop from reproduction through regression testing.
Install
npx skills add https://github.com/vinvcn/mattpocock-skills-zh-cn --skill diagnoseWhat is this skill?
- Six-phase diagnosis loop: reproduce → minimise → hypothesise → instrument → fix → regression-test.
- Human-in-the-loop bash template with step prompts and capture() for user-observed errors.
- Skips phases only when rationale is explicit—no shortcutting reproduction.
- Supports performance regressions as well as functional bugs and thrown errors.
- Outputs captured KEY=VALUE pairs (e.g. ERRORED, ERROR_MSG) for the agent to parse.
- Six named diagnosis phases ending in regression-test.
- HITL template defines step() and capture() helpers for structured user input.
Adoption & trust: 524 installs on skills.sh; 485 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You see a failure or slowdown but cannot reliably reproduce it, so every fix is a guess and regressions keep coming back.
Who is it for?
Solo developers debugging non-obvious errors, export failures, or perf regressions who can follow short terminal repro scripts.
Skip if: One-line typos with obvious stack traces, pure feature requests with no failure, or teams that forbid interactive repro loops in the agent session.
When should I use this skill?
User says diagnose this, debug this, reports a bug, says something is broken/throwing/failing, or describes a performance regression.
What do I get? / Deliverables
You get a minimised repro, tested hypothesis, instrumented evidence, a justified fix, and a regression check—plus HITL captures the agent can parse.
- Minimised reproduction steps
- Captured repro variables (KEY=VALUE) for the agent
- Fix with documented hypothesis and regression verification
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Ship → testing because the workflow ends with regression tests and repro discipline typical before release or while stabilizing a branch. Testing subphase fits structured reproduction, instrumentation, and regression verification rather than ad-hoc chat guesses.
Where it fits
Reproduce an export button failure and lock a regression test before tagging a release.
Capture user-visible error text via HITL captures when logs alone are insufficient.
Minimise a UI-only bug on localhost before adding broad instrumentation across the stack.
Hypothesise and measure a performance regression instead of shipping speculative optimisations.
How it compares
Use instead of unstructured “try this patch” chat when you need enforced reproduce-first debugging with human-verified steps.
Common Questions / FAQ
Who is diagnose for?
Solo and indie builders using Claude Code, Cursor, or Codex who hit stubborn bugs or performance regressions and want a staged diagnosis ritual rather than speculative edits.
When should I use diagnose?
Use it during Build while integrating features, during Ship when stabilizing releases with regression tests, or during Operate when triaging production-like failures—whenever you say diagnose this, debug this, or report something broken or throwing.
Is diagnose safe to install?
The skill includes a bash HITL template that runs locally with user input; review the Security Audits panel on this Prism page before letting an agent execute or modify scripts in your environment.
SKILL.md
READMESKILL.md - Diagnose
#!/usr/bin/env bash # Human-in-the-loop reproduction loop. # Copy this file, edit the steps below, and run it. # The agent runs the script; the user follows prompts in their terminal. # # Usage: # bash hitl-loop.template.sh # # Two helpers: # step "<instruction>" → show instruction, wait for Enter # capture VAR "<question>" → show question, read response into VAR # # At the end, captured values are printed as KEY=VALUE for the agent to parse. set -euo pipefail step() { printf '\n>>> %s\n' "$1" read -r -p " [Enter when done] " _ } capture() { local var="$1" question="$2" answer printf '\n>>> %s\n' "$question" read -r -p " > " answer printf -v "$var" '%s' "$answer" } # --- edit below --------------------------------------------------------- step "Open the app at http://localhost:3000 and sign in." capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)" capture ERROR_MSG "Paste the error message (or 'none'):" # --- edit above --------------------------------------------------------- printf '\n--- Captured ---\n' printf 'ERRORED=%s\n' "$ERRORED" printf 'ERROR_MSG=%s\n' "$ERROR_MSG" --- name: diagnose description: 面向棘手 bug 和性能回退的纪律化 diagnosis loop。Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression. --- # Diagnose 处理困难 bug 的纪律。只有在理由明确时才跳过阶段。 ## Phase 1 — Build a feedback loop **这就是这个 skill 的核心。** 其他部分都是机械流程。如果你有一个快速、确定、agent 可运行的 pass/fail 信号来覆盖这个 bug,你就能找到原因;bisection、hypothesis-testing 和 instrumentation 都只是消费这个信号。没有它,盯着代码看再久也救不了你。 在这里投入不成比例的精力。**要主动。要有创造性。不要轻易放弃。** ### 构造反馈环的方式,按大致顺序尝试 1. **Failing test**,放在能触达 bug 的 seam 上,可以是 unit、integration、e2e。 2. 针对运行中 dev server 的 **Curl / HTTP script**。 3. 带 fixture input 的 **CLI invocation**,把 stdout 和已知正确 snapshot 做 diff。 4. **Headless browser script**(Playwright / Puppeteer),驱动 UI,并断言 DOM/console/network。 5. **Replay a captured trace.** 把真实 network request、payload 或 event log 存到磁盘,隔离重放到对应 code path。 6. **Throwaway harness.** 启动系统的最小子集(一个 service、mocked deps),用一次函数调用触发 bug code path。 7. **Property / fuzz loop.** 如果 bug 是“有时输出错误”,运行 1000 个随机输入来寻找失败模式。 8. **Bisection harness.** 如果 bug 出现在两个已知状态之间(commit、dataset、version),自动化“在状态 X 启动、检查、重复”,这样可以 `git bisect run`。 9. **Differential loop.** 对同一输入分别运行 old-version vs new-version(或两组 configs)并 diff 输出。 10. **HITL bash script.** 最后手段。如果必须有人点击,用 `scripts/hitl-loop.template.sh` 驱动_人_,让循环仍然结构化。捕获的输出再反馈给你。 构建正确的反馈环,bug 就已经修好 90%。 ### 迭代反馈环本身 把反馈环当作一个产品。有了_某个_循环之后,问: - 我能让它更快吗?(缓存 setup、跳过无关 init、缩小 test scope。) - 我能让信号更尖锐吗?(断言具体症状,而不是“没有崩溃”。) - 我能让它更确定吗?(固定时间、固定 RNG seed、隔离 filesystem、冻结 network。) 30 秒且 flaky 的循环只比没有循环好一点点。2 秒且确定的循环是调试超能力。 ### 非确定性 bug 目标不是干净复现,而是**提高复现率**。把触发器循环 100 次,并行化、加压、缩窄 timing windows、注入 sleeps。50% flake 的 bug 可以调试;1% 不行。持续提高复现率,直到它可调试。 ### 当你确实无法构建循环 停下来并明确说明。列出你尝试过的方式。向用户请求:(a) 可复现环境的访问权限,(b) 捕获的 artifact(HAR file、log dump、core dump、带时间戳的 screen recording),或 (c) 添加临时 production instrumentation 的许可。**不要**在没有循环时继续 hypothesise。 在你有一个可信的循环之前,不要进入 Phase 2。 ## Phase 2 — Reproduce 运行循环。看到 bug 出现。 确认: - [ ] 循环产生的是**用户**描述的 failure mode,而不是附近的另一个失败。Wrong bug = wrong fix。 - [ ] 失败能在多次运行中复现;对非确定性 bug,则复现率足够高,可以据此调试。 - [ ] 你已经捕获精确症状(error message、wrong output、slow timing),后续阶段能验证 fix 确实解决了它。 复现 bug 之前不要继续。 ## Phase 3 — Hypothesise 在测试任何假设前,先生成 **3-5 个排序后的 hypotheses**。只生成一个假设会把你锚定在第一个看起来合理的想法上。 每个 hypothesis 必须是**可证伪的**:说明它产生的预测。 > 格式:"If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse." 如果你说不出预测,这个 hypothesis 就是 vibe。丢掉或收紧。 **在测试前把排序列表展示给用户。** 他们通常有 domain knowledge,可以瞬间重排(“我们刚部署了 #3 的改动”),或知道哪些假设已经被排除。这个 checkpoint 便宜但省时。不要阻塞在这里;如果用户 AFK,就按你的排序继续。 ## Phase 4 — Instrument 每个 probe 都必须映射到 Phase 3 中的一个具体预测。**一次只改一个变量。** 工具偏好: 1. 如果环境支持,优先用 **Debugger / REPL inspection**。一个 breakpoint 胜过十条 log。 2. 在能区分 hypotheses 的边界上加 **tar