Diagnose

Name: Diagnose
Author: vinvcn

vinvcn/mattpocock-skills-zh-cn

Run a disciplined reproduce-minimise-hypothesise-instrument-fix-regression loop when a solo builder is stuck on a nasty bug or performance regression.

Overview

Diagnose is an agent skill most often used in Ship (also Build, Operate) that runs a disciplined bug and performance-regression loop from reproduction through regression testing.

Install

npx skills add https://github.com/vinvcn/mattpocock-skills-zh-cn --skill diagnose

What is this skill?

Six-phase diagnosis loop: reproduce → minimise → hypothesise → instrument → fix → regression-test.
Human-in-the-loop bash template with step prompts and capture() for user-observed errors.
Skips phases only when rationale is explicit—no shortcutting reproduction.
Supports performance regressions as well as functional bugs and thrown errors.
Outputs captured KEY=VALUE pairs (e.g. ERRORED, ERROR_MSG) for the agent to parse.
Six named diagnosis phases ending in regression-test.
HITL template defines step() and capture() helpers for structured user input.

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 524 installs on skills.sh; 485 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You see a failure or slowdown but cannot reliably reproduce it, so every fix is a guess and regressions keep coming back.

Who is it for?

Solo developers debugging non-obvious errors, export failures, or perf regressions who can follow short terminal repro scripts.

Skip if: One-line typos with obvious stack traces, pure feature requests with no failure, or teams that forbid interactive repro loops in the agent session.

When should I use this skill?

User says diagnose this, debug this, reports a bug, says something is broken/throwing/failing, or describes a performance regression.

What do I get? / Deliverables

You get a minimised repro, tested hypothesis, instrumented evidence, a justified fix, and a regression check—plus HITL captures the agent can parse.

Minimised reproduction steps
Captured repro variables (KEY=VALUE) for the agent
Fix with documented hypothesis and regression verification

Recommended Skills

Azure Diagnosticsmicrosoft/azure-skills

Azure Diagnostics walks agents through systematic production troubleshooting on Azure—checking resource health, AppLens …374k installs·1.2k stars

Diagnosemattpocock/skills

Matt Pocock-style diagnose skill that prioritizes deterministic pass/fail signals then walks through structured debuggin…187k installs·121k stars

Systematic Debuggingobra/superpowers

Systematic Debugging is an agent skill that forces a root-cause-first workflow before any proposed fix for bugs, test fa…134k installs·221k stars

Safe Debuglllllllama/rigorpilot-skills

safe-debug implements Rigor Debug / Rigor Audit mode for deep-learning research repos: your agent reads the traceback or…32.3k installs·412 stars

Mastramastra-ai/skills

The mastra skill is a structured troubleshooting companion for solo builders shipping TypeScript agents on the Mastra fr…18.5k installs·57 stars

Insforge Debuginsforge/agent-skills

InsForge Debug guides solo builders through structured diagnosis on InsForge-backed projects when something breaks in pr…9.2k installs·27 stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Canonical shelf is Ship → testing because the workflow ends with regression tests and repro discipline typical before release or while stabilizing a branch. Testing subphase fits structured reproduction, instrumentation, and regression verification rather than ad-hoc chat guesses.

Also useful

OperateError tracking

Also useful

BuildBackend, data & payments

Where it fits

Example use

ShipTesting & QA

Reproduce an export button failure and lock a regression test before tagging a release.

Example use

OperateError tracking

Capture user-visible error text via HITL captures when logs alone are insufficient.

Example use

BuildUI/UX & frontend

Minimise a UI-only bug on localhost before adding broad instrumentation across the stack.

Example use

ShipPerformance

Hypothesise and measure a performance regression instead of shipping speculative optimisations.

How it compares

Use instead of unstructured “try this patch” chat when you need enforced reproduce-first debugging with human-verified steps.

Common Questions / FAQ

Who is diagnose for?

Solo and indie builders using Claude Code, Cursor, or Codex who hit stubborn bugs or performance regressions and want a staged diagnosis ritual rather than speculative edits.

When should I use diagnose?

Use it during Build while integrating features, during Ship when stabilizing releases with regression tests, or during Operate when triaging production-like failures—whenever you say diagnose this, debug this, or report something broken or throwing.

Is diagnose safe to install?

The skill includes a bash HITL template that runs locally with user input; review the Security Audits panel on this Prism page before letting an agent execute or modify scripts in your environment.

SKILL.md

READMESKILL.md - Diagnose

#!/usr/bin/env bash
# Human-in-the-loop reproduction loop.
# Copy this file, edit the steps below, and run it.
# The agent runs the script; the user follows prompts in their terminal.
#
# Usage:
#   bash hitl-loop.template.sh
#
# Two helpers:
#   step "<instruction>"          → show instruction, wait for Enter
#   capture VAR "<question>"      → show question, read response into VAR
#
# At the end, captured values are printed as KEY=VALUE for the agent to parse.

set -euo pipefail

step() {
  printf '\n>>> %s\n' "$1"
  read -r -p "    [Enter when done] " _
}

capture() {
  local var="$1" question="$2" answer
  printf '\n>>> %s\n' "$question"
  read -r -p "    > " answer
  printf -v "$var" '%s' "$answer"
}

# --- edit below ---------------------------------------------------------

step "Open the app at http://localhost:3000 and sign in."

capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"

capture ERROR_MSG "Paste the error message (or 'none'):"

# --- edit above ---------------------------------------------------------

printf '\n--- Captured ---\n'
printf 'ERRORED=%s\n' "$ERRORED"
printf 'ERROR_MSG=%s\n' "$ERROR_MSG"


---
name: diagnose
description: 面向棘手 bug 和性能回退的纪律化 diagnosis loop。Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
---

# Diagnose

处理困难 bug 的纪律。只有在理由明确时才跳过阶段。

## Phase 1 — Build a feedback loop

**这就是这个 skill 的核心。** 其他部分都是机械流程。如果你有一个快速、确定、agent 可运行的 pass/fail 信号来覆盖这个 bug，你就能找到原因；bisection、hypothesis-testing 和 instrumentation 都只是消费这个信号。没有它，盯着代码看再久也救不了你。

在这里投入不成比例的精力。**要主动。要有创造性。不要轻易放弃。**

### 构造反馈环的方式，按大致顺序尝试

1. **Failing test**，放在能触达 bug 的 seam 上，可以是 unit、integration、e2e。
2. 针对运行中 dev server 的 **Curl / HTTP script**。
3. 带 fixture input 的 **CLI invocation**，把 stdout 和已知正确 snapshot 做 diff。
4. **Headless browser script**（Playwright / Puppeteer），驱动 UI，并断言 DOM/console/network。
5. **Replay a captured trace.** 把真实 network request、payload 或 event log 存到磁盘，隔离重放到对应 code path。
6. **Throwaway harness.** 启动系统的最小子集（一个 service、mocked deps），用一次函数调用触发 bug code path。
7. **Property / fuzz loop.** 如果 bug 是“有时输出错误”，运行 1000 个随机输入来寻找失败模式。
8. **Bisection harness.** 如果 bug 出现在两个已知状态之间（commit、dataset、version），自动化“在状态 X 启动、检查、重复”，这样可以 `git bisect run`。
9. **Differential loop.** 对同一输入分别运行 old-version vs new-version（或两组 configs）并 diff 输出。
10. **HITL bash script.** 最后手段。如果必须有人点击，用 `scripts/hitl-loop.template.sh` 驱动_人_，让循环仍然结构化。捕获的输出再反馈给你。

构建正确的反馈环，bug 就已经修好 90%。

### 迭代反馈环本身

把反馈环当作一个产品。有了_某个_循环之后，问：

- 我能让它更快吗？（缓存 setup、跳过无关 init、缩小 test scope。）
- 我能让信号更尖锐吗？（断言具体症状，而不是“没有崩溃”。）
- 我能让它更确定吗？（固定时间、固定 RNG seed、隔离 filesystem、冻结 network。）

30 秒且 flaky 的循环只比没有循环好一点点。2 秒且确定的循环是调试超能力。

### 非确定性 bug

目标不是干净复现，而是**提高复现率**。把触发器循环 100 次，并行化、加压、缩窄 timing windows、注入 sleeps。50% flake 的 bug 可以调试；1% 不行。持续提高复现率，直到它可调试。

### 当你确实无法构建循环

停下来并明确说明。列出你尝试过的方式。向用户请求：(a) 可复现环境的访问权限，(b) 捕获的 artifact（HAR file、log dump、core dump、带时间戳的 screen recording），或 (c) 添加临时 production instrumentation 的许可。**不要**在没有循环时继续 hypothesise。

在你有一个可信的循环之前，不要进入 Phase 2。

## Phase 2 — Reproduce

运行循环。看到 bug 出现。

确认：

- [ ] 循环产生的是**用户**描述的 failure mode，而不是附近的另一个失败。Wrong bug = wrong fix。
- [ ] 失败能在多次运行中复现；对非确定性 bug，则复现率足够高，可以据此调试。
- [ ] 你已经捕获精确症状（error message、wrong output、slow timing），后续阶段能验证 fix 确实解决了它。

复现 bug 之前不要继续。

## Phase 3 — Hypothesise

在测试任何假设前，先生成 **3-5 个排序后的 hypotheses**。只生成一个假设会把你锚定在第一个看起来合理的想法上。

每个 hypothesis 必须是**可证伪的**：说明它产生的预测。

> 格式："If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."

如果你说不出预测，这个 hypothesis 就是 vibe。丢掉或收紧。

**在测试前把排序列表展示给用户。** 他们通常有 domain knowledge，可以瞬间重排（“我们刚部署了 #3 的改动”），或知道哪些假设已经被排除。这个 checkpoint 便宜但省时。不要阻塞在这里；如果用户 AFK，就按你的排序继续。

## Phase 4 — Instrument

每个 probe 都必须映射到 Phase 3 中的一个具体预测。**一次只改一个变量。**

工具偏好：

1. 如果环境支持，优先用 **Debugger / REPL inspection**。一个 breakpoint 胜过十条 log。
2. 在能区分 hypotheses 的边界上加 **tar

What is this skill?

Six-phase diagnosis loop: reproduce → minimise → hypothesise → instrument → fix → regression-test.

Human-in-the-loop bash template with step prompts and capture() for user-observed errors.

Skips phases only when rationale is explicit—no shortcutting reproduction.

Supports performance regressions as well as functional bugs and thrown errors.

Outputs captured KEY=VALUE pairs (e.g. ERRORED, ERROR_MSG) for the agent to parse.

Six named diagnosis phases ending in regression-test.

HITL template defines step() and capture() helpers for structured user input.

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 524 installs on skills.sh; 485 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

OperateError tracking

Also useful

BuildBackend, data & payments

Where it fits

Example use

ShipTesting & QA

Reproduce an export button failure and lock a regression test before tagging a release.

Example use

OperateError tracking

Capture user-visible error text via HITL captures when logs alone are insufficient.

Example use

BuildUI/UX & frontend

Minimise a UI-only bug on localhost before adding broad instrumentation across the stack.

Example use

ShipPerformance

Hypothesise and measure a performance regression instead of shipping speculative optimisations.

SKILL.md

READMESKILL.md - Diagnose

#!/usr/bin/env bash
# Human-in-the-loop reproduction loop.
# Copy this file, edit the steps below, and run it.
# The agent runs the script; the user follows prompts in their terminal.
#
# Usage:
#   bash hitl-loop.template.sh
#
# Two helpers:
#   step "<instruction>"          → show instruction, wait for Enter
#   capture VAR "<question>"      → show question, read response into VAR
#
# At the end, captured values are printed as KEY=VALUE for the agent to parse.

set -euo pipefail

step() {
  printf '\n>>> %s\n' "$1"
  read -r -p "    [Enter when done] " _
}

capture() {
  local var="$1" question="$2" answer
  printf '\n>>> %s\n' "$question"
  read -r -p "    > " answer
  printf -v "$var" '%s' "$answer"
}

# --- edit below ---------------------------------------------------------

step "Open the app at http://localhost:3000 and sign in."

capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)"

capture ERROR_MSG "Paste the error message (or 'none'):"

# --- edit above ---------------------------------------------------------

printf '\n--- Captured ---\n'
printf 'ERRORED=%s\n' "$ERRORED"
printf 'ERROR_MSG=%s\n' "$ERROR_MSG"


---
name: diagnose
description: 面向棘手 bug 和性能回退的纪律化 diagnosis loop。Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression.
---

# Diagnose

处理困难 bug 的纪律。只有在理由明确时才跳过阶段。

## Phase 1 — Build a feedback loop

**这就是这个 skill 的核心。** 其他部分都是机械流程。如果你有一个快速、确定、agent 可运行的 pass/fail 信号来覆盖这个 bug，你就能找到原因；bisection、hypothesis-testing 和 instrumentation 都只是消费这个信号。没有它，盯着代码看再久也救不了你。

在这里投入不成比例的精力。**要主动。要有创造性。不要轻易放弃。**

### 构造反馈环的方式，按大致顺序尝试

1. **Failing test**，放在能触达 bug 的 seam 上，可以是 unit、integration、e2e。
2. 针对运行中 dev server 的 **Curl / HTTP script**。
3. 带 fixture input 的 **CLI invocation**，把 stdout 和已知正确 snapshot 做 diff。
4. **Headless browser script**（Playwright / Puppeteer），驱动 UI，并断言 DOM/console/network。
5. **Replay a captured trace.** 把真实 network request、payload 或 event log 存到磁盘，隔离重放到对应 code path。
6. **Throwaway harness.** 启动系统的最小子集（一个 service、mocked deps），用一次函数调用触发 bug code path。
7. **Property / fuzz loop.** 如果 bug 是“有时输出错误”，运行 1000 个随机输入来寻找失败模式。
8. **Bisection harness.** 如果 bug 出现在两个已知状态之间（commit、dataset、version），自动化“在状态 X 启动、检查、重复”，这样可以 `git bisect run`。
9. **Differential loop.** 对同一输入分别运行 old-version vs new-version（或两组 configs）并 diff 输出。
10. **HITL bash script.** 最后手段。如果必须有人点击，用 `scripts/hitl-loop.template.sh` 驱动_人_，让循环仍然结构化。捕获的输出再反馈给你。

构建正确的反馈环，bug 就已经修好 90%。

### 迭代反馈环本身

把反馈环当作一个产品。有了_某个_循环之后，问：

- 我能让它更快吗？（缓存 setup、跳过无关 init、缩小 test scope。）
- 我能让信号更尖锐吗？（断言具体症状，而不是“没有崩溃”。）
- 我能让它更确定吗？（固定时间、固定 RNG seed、隔离 filesystem、冻结 network。）

30 秒且 flaky 的循环只比没有循环好一点点。2 秒且确定的循环是调试超能力。

### 非确定性 bug

目标不是干净复现，而是**提高复现率**。把触发器循环 100 次，并行化、加压、缩窄 timing windows、注入 sleeps。50% flake 的 bug 可以调试；1% 不行。持续提高复现率，直到它可调试。

### 当你确实无法构建循环

停下来并明确说明。列出你尝试过的方式。向用户请求：(a) 可复现环境的访问权限，(b) 捕获的 artifact（HAR file、log dump、core dump、带时间戳的 screen recording），或 (c) 添加临时 production instrumentation 的许可。**不要**在没有循环时继续 hypothesise。

在你有一个可信的循环之前，不要进入 Phase 2。

## Phase 2 — Reproduce

运行循环。看到 bug 出现。

确认：

- [ ] 循环产生的是**用户**描述的 failure mode，而不是附近的另一个失败。Wrong bug = wrong fix。
- [ ] 失败能在多次运行中复现；对非确定性 bug，则复现率足够高，可以据此调试。
- [ ] 你已经捕获精确症状（error message、wrong output、slow timing），后续阶段能验证 fix 确实解决了它。

复现 bug 之前不要继续。

## Phase 3 — Hypothesise

在测试任何假设前，先生成 **3-5 个排序后的 hypotheses**。只生成一个假设会把你锚定在第一个看起来合理的想法上。

每个 hypothesis 必须是**可证伪的**：说明它产生的预测。

> 格式："If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."

如果你说不出预测，这个 hypothesis 就是 vibe。丢掉或收紧。

**在测试前把排序列表展示给用户。** 他们通常有 domain knowledge，可以瞬间重排（“我们刚部署了 #3 的改动”），或知道哪些假设已经被排除。这个 checkpoint 便宜但省时。不要阻塞在这里；如果用户 AFK，就按你的排序继续。

## Phase 4 — Instrument

每个 probe 都必须映射到 Phase 3 中的一个具体预测。**一次只改一个变量。**

工具偏好：

1. 如果环境支持，优先用 **Debugger / REPL inspection**。一个 breakpoint 胜过十条 log。
2. 在能区分 hypotheses 的边界上加 **tar

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is diagnose for?

When should I use diagnose?

Is diagnose safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is diagnose for?

When should I use diagnose?

Is diagnose safe to install?

SKILL.md