
Post Mortem
Turn a validated bug fix into a durable RCA writeup so the next engineer—or future-you—can recover context without re-debugging.
Overview
Post-mortem is an agent skill most often used in Operate (also Ship) that documents the engineering truth of a fixed, validated bug—root cause, fix, and escape analysis—for engineers and future you.
Install
npx skills add https://github.com/thananon/9arm-skills --skill post-mortemWhat is this skill?
- Produces a canonical engineering post-mortem after the fix is landed and validated—not for open hypotheses
- Welcomes code identifiers, mechanism detail, and validation evidence for engineer audiences
- Explicit handoff path to management-talk for leadership-safe reframing of the same facts
- Refuses premature writeups when the bug is unfixed or unvalidated
- Triggered by /post-mortem, RCA, or “document this fix” after a debug session
Adoption & trust: 1.1k installs on skills.sh; 2.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You fixed the bug in chat or a branch, but the ticket will close with no durable RCA and everyone will re-learn the same failure months later.
Who is it for?
Closing out validated fixes on small teams that lack a formal incident-management product but still want searchable RCA history.
Skip if: Hypothesis writeups before a fix ships, customer-only release notes without engineering detail, or bugs you have not re-tested after the patch.
When should I use this skill?
/post-mortem, RCA requests, document this fix, or after a validated debug fix when closing the ticket.
What do I get? / Deliverables
You get a canonical post-mortem artifact ready to attach to the issue or wiki, with an optional next step to invoke management-talk for leadership-facing language.
- Engineering post-mortem document (root cause, mechanism, fix, validation, escape analysis)
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Operate because the skill closes the incident loop after production or staging issues are fixed. Errors is the right facet for root-cause records tied to defects, outages, and regressions rather than greenfield build work.
Where it fits
Draft the RCA after a production 500 is patched and replay tests pass.
Attach a structured post-mortem before merging the hotfix PR so reviewers see mechanism and validation.
Archive what you learned from a recurring cron failure before scheduling preventive work.
How it compares
Use instead of pasting debug chat logs—this skill structures RCA fields engineers expect, not ad-hoc Slack threads.
Common Questions / FAQ
Who is post-mortem for?
Solo builders and small engineering teams who want a durable, identifier-friendly RCA after a real fix—not a placeholder while debugging is still open.
When should I use post-mortem?
After a debug session lands a validated fix in Operate/errors, when closing Ship/review tickets, or when you need iterate-phase documentation before archiving an incident.
Is post-mortem safe to install?
Review the Security Audits panel on this Prism page before installing; the skill mainly drafts text from context you provide and does not inherently require network access.
Workflow Chain
Then invoke: management talk
SKILL.md
READMESKILL.md - Post Mortem
# Post-mortem The canonical engineering record of a bug fix. Written **after** debugging lands a real fix, **for** other engineers (and future-you, who will have forgotten everything in 6 months). Code identifiers are welcome here — this is the artifact that lets the next person recover the mental model fast. For the up-the-org version of this same content, hand the finished post-mortem to [`management-talk`](../../productivity/management-talk/SKILL.md). They compose: post-mortem owns the engineering truth, management-talk reframes it for leadership. ## When to invoke - "/post-mortem" - "write the post-mortem / postmortem / RCA / root-cause analysis" - "document this fix" / "write up the root cause" / "close out this bug with a writeup" - After a debug session has clearly landed a fix, proactively offer to draft one. ## When NOT to use - **Bug not fixed yet, or fix not validated.** A post-mortem of a hypothesis is misleading. Refuse and tell the user what's missing. - **Customer-visible outage / incident.** Those need a separate incident report (timeline, blast radius, paging history, comms). This skill is bug-fix scope. Flag and confirm before producing one. - **Trivial fix** (typo, obvious one-liner). The PR description is the record. Don't manufacture ceremony. ## Required inputs — refuse to draft without these Before writing a single line, confirm all four. If any are missing, list what's missing and stop: - [ ] **Reliable repro** exists (not "happens sometimes" — a deterministic or high-rate-flake repro the next person can run). - [ ] **Root cause is known** (the mechanism is identified, not a hypothesis). - [ ] **Fix is identified** (PR / commit / branch pointer). - [ ] **Fix is validated** (the original repro now passes; the customer workload / failing test now succeeds). These map directly to `debug-mantra` steps 1–4. If you came in via `debug-mantra`, the breadcrumb ledger from step 4 is your raw material — pull from it. ## Structure Use these blocks in this order. **Summary, Root cause, Fix, and Validation are mandatory.** The rest are conditional but usually present. ### 1. Summary _(mandatory)_ One paragraph. What broke, in user/workload terms. What fixed it, in one sentence. JIRA key, PR number, owner. A reader who stops here should have the right answer. ### 2. Symptom What was actually observed. Test output, error message, log line, perf number, customer report. Concrete identifiers — don't paraphrase the failure mode. ### 3. Root cause _(mandatory)_ The actual bug mechanism. **Code identifiers welcome and expected** — function names, file paths, struct fields, branch conditions, commit SHAs of the offending change. Walk the cause chain end-to-end. This is the most expensive section and the reason the post-mortem exists at all. Future-you will live or die by how clearly you write this. ### 4. Why it produced the symptom Link the root cause to the symptom. Often non-obvious — the bug is in `tadaLaunchPrepare` but the visible failure is a customer training run hanging hours later. Walk the chain so a reader who only knows the symptom can connect it back to the cause without re-deriving it. ### 5. Fix _(mandatory)_ What changed and **why this change addresses the root cause** rather than hiding the symptom. Link to PR / commit. If a previous fix attempt papered over the symptom, name it and explain what was wrong with it — that history is part of the cause. ### 6. How it w