Incident Response

Name: Incident Response
Author: anthropics

anthropics/knowledge-work-plugins

4.6k installs
23.1k repo stars
Updated July 28, 2026
anthropics/knowledge-work-plugins

Workflow that manages incidents from detection through resolution and postmortem analysis, automating severity assessment, status communication, and root cause documentation.

About

This workflow automates incident management across four phases: triage (assess severity, identify affected systems, assign roles), communicate (draft internal and customer updates, set war room cadence), mitigate (document steps, track timeline, confirm resolution), and postmortem (generate blameless analysis with 5-whys, action items, and lessons learned). Developers trigger it when production fails, alerts fire, or post-incident review is needed. The tool classifies incidents as SEV1-4 based on user impact and response urgency, generates structured status updates at regular intervals, and produces postmortem documents that separate systems-level root causes from individual actions. It integrates with monitoring, incident management, and chat platforms when available to pull metrics, page responders, and broadcast updates automatically. Severity classification (SEV1-4) with response time SLAs from immediate all-hands to next business day Structured status updates with impact, actions taken, next steps, and timestamped timeline Blameless postmortem generation including 5-whys root cause analysis and action items with owners Three modes - new incident, status update mid-incident, p.

Severity classification (SEV1-4) with response time SLAs from immediate all-hands to next business day
Structured status updates with impact, actions taken, next steps, and timestamped timeline
Blameless postmortem generation including 5-whys root cause analysis and action items with owners
Three modes - new incident, status update mid-incident, postmortem generation - triggered by plain language
Integrations with monitoring (pull alerts/metrics), incident management (create/page), and chat (post updates)

Incident Response by the numbers

4,575 all-time installs (skills.sh)
+241 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #78 of 2,742 Automation & Workflows skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

incident-response capabilities & compatibility

Capabilities: assess incident severity using impact criteria · draft internal status updates with timeline · generate customer facing communication · create blameless postmortem documents · perform 5 whys root cause analysis · integrate with monitoring systems for alerts and
Works with: slack · datadog · grafana · sentry
Use cases: debugging
Runs: Hosted SaaS
Pricing: Freemium

From the docs

What incident-response says it does

Manage an incident from detection through postmortem.

incident-response#how-it-works

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill incident-response

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/anthropics/knowledge-work-plugins/incident-response.svg)](https://skillselion.com/skills/anthropics/knowledge-work-plugins/incident-response)

Installs	4.6k
repo stars	★ 23.1k
Security audit	3 / 3 scanners passed
Last updated	July 28, 2026
Repository	anthropics/knowledge-work-plugins ↗

What it does

Coordinate incident response from detection through postmortem - triage severity, draft communications, document timeline, and conduct root cause analysis.

Who is it for?

DevOps engineers, SREs, and incident commanders coordinating production outages and critical service degradation.

Skip if: Non-technical incident reporting; pre-incident planning or prevention; single-person debugging sessions.

When should I use this skill?

Production is down, a SEV1-2 alert fires, mid-incident status is needed, or postmortem review is starting.

What you get

Reduced mean time to respond (MTTR) and resolve; documented incidents with actionable root cause analysis; consistent blameless postmortems that drive system improvements.

Incident timeline
Severity assessment
Blameless postmortem

By the numbers

Four structured phases: triage, communicate, mitigate, postmortem
Four severity levels (SEV1-4) with defined response times
5-whys root cause analysis framework

Files

SKILL.mdMarkdownGitHub ↗

/incident-response

If you see unfamiliar placeholders or need to check which tools are connected, see CONNECTORS.md.

Manage an incident from detection through postmortem.

Usage

/incident-response $ARGUMENTS

Modes

/incident-response new [description]     # Start a new incident
/incident-response update [status]       # Post a status update
/incident-response postmortem            # Generate postmortem from incident data

If no mode is specified, ask what phase the incident is in.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                    INCIDENT RESPONSE                               │
├─────────────────────────────────────────────────────────────────┤
│  Phase 1: TRIAGE                                                  │
│  ✓ Assess severity (SEV1-4)                                     │
│  ✓ Identify affected systems and users                          │
│  ✓ Assign roles (IC, comms, responders)                         │
│                                                                    │
│  Phase 2: COMMUNICATE                                              │
│  ✓ Draft internal status update                                  │
│  ✓ Draft customer communication (if needed)                     │
│  ✓ Set up war room and cadence                                   │
│                                                                    │
│  Phase 3: MITIGATE                                                 │
│  ✓ Document mitigation steps taken                               │
│  ✓ Track timeline of events                                      │
│  ✓ Confirm resolution                                            │
│                                                                    │
│  Phase 4: POSTMORTEM                                               │
│  ✓ Blameless postmortem document                                 │
│  ✓ Timeline reconstruction                                       │
│  ✓ Root cause analysis (5 whys)                                  │
│  ✓ Action items with owners                                      │
└─────────────────────────────────────────────────────────────────┘

Severity Classification

Level	Criteria	Response Time
SEV1	Service down, all users affected	Immediate, all-hands
SEV2	Major feature degraded, many users affected	Within 15 min
SEV3	Minor feature issue, some users affected	Within 1 hour
SEV4	Cosmetic or low-impact issue	Next business day

Communication Guidance

Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.

Output — Status Update

## Incident Update: [Title]
**Severity:** SEV[1-4] | **Status:** Investigating | Identified | Monitoring | Resolved
**Impact:** [Who/what is affected]
**Last Updated:** [Timestamp]

### Current Status
[What we know now]

### Actions Taken
- [Action 1]
- [Action 2]

### Next Steps
- [What's happening next and ETA]

### Timeline
| Time | Event |
|------|-------|
| [HH:MM] | [Event] |

Output — Postmortem

## Postmortem: [Incident Title]
**Date:** [Date] | **Duration:** [X hours] | **Severity:** SEV[X]
**Authors:** [Names] | **Status:** Draft

### Summary
[2-3 sentence plain-language summary]

### Impact
- [Users affected]
- [Duration of impact]
- [Business impact if quantifiable]

### Timeline
| Time (UTC) | Event |
|------------|-------|
| [HH:MM] | [Event] |

### Root Cause
[Detailed explanation of what caused the incident]

### 5 Whys
1. Why did [symptom]? → [Because...]
2. Why did [cause 1]? → [Because...]
3. Why did [cause 2]? → [Because...]
4. Why did [cause 3]? → [Because...]
5. Why did [cause 4]? → [Root cause]

### What Went Well
- [Things that worked]

### What Went Poorly
- [Things that didn't work]

### Action Items
| Action | Owner | Priority | Due Date |
|--------|-------|----------|----------|
| [Action] | [Person] | P0/P1/P2 | [Date] |

### Lessons Learned
[Key takeaways for the team]

If Connectors Available

If ~~monitoring is connected:

Pull alert details and metrics
Show graphs of affected metrics

If ~~incident management is connected:

Create or update incident in PagerDuty/Opsgenie
Page on-call responders

If ~~chat is connected:

Post status updates to incident channel
Create war room channel

Tips

1. Start writing immediately — Don't wait for complete information. Update as you learn more. 2. Keep updates factual — What we know, what we've done, what's next. No speculation. 3. Postmortems are blameless — Focus on systems and processes, not individuals.

Related skills

Agent BrowserGive their coding agent reliable, high-fidelity control over any website or Electron desktop app.577k39.1k

Lark ApprovalLet their AI coding agent create, read, update, and approve items in Lark (Feishu) approval workflows without leaving the coding environment.471k

Lark EventHandle Feishu/Lark bot events, webhooks, and subscription callbacks in agent-driven backend code.471k

Lark Workflow Meeting SummaryAutomatically generate structured meeting summaries and action items from Lark/Feishu calls without manual note-taking.470k

Lark Workflow Standup ReportAutomatically generate and post daily standup reports in Lark/Feishu from their workflow and activity data.470k

Lark Vc AgentGive their coding agent the ability to read, create, and update documents, tasks, and wiki pages inside Feishu (Lark).415k

How it compares

More structured and automated than manual Slack incident channels; less specialized than dedicated incident management platforms like PagerDuty but provides core workflow automation.

FAQ

What severity levels are used and what response times apply?

SEV1 (service down, all users, immediate all-hands), SEV2 (major feature degraded, 15 min), SEV3 (minor issue, 1 hour), SEV4 (cosmetic, next business day).

What should status updates include?

What's happening now, who/what is affected, actions taken, next steps with ETA, and timestamped timeline of events.

How does the postmortem approach root cause?

Uses blameless 5-whys analysis focusing on systems and processes, not individuals. Includes what went well, what failed, and action items with owners.

Is Incident Response safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Automation & Workflowssupportmonitoring