
Error Patterns
Recover gracefully when multiple coding agents crash, overflow context, or conflict on files using a defined escalation ladder.
Overview
error-patterns is an agent skill most often used in Operate (also Ship and Build) that defines multi-agent error categories, first actions, and a three-tier escalation ladder from self-recovery to human handoff.
Install
npx skills add https://github.com/athola/claude-night-market --skill error-patternsWhat is this skill?
- Four agent error categories mapped from service-level taxonomy (crash, context overflow, merge conflict, partial failure
- Three-tier escalation: self-recovery → lead agent → human with full context
- Merge-conflict playbook: stop parallel work, resolve, then resume
- Heartbeat and reassignment guidance when an agent stops responding
- Truncation handling via summarize state and continuation handoff
- 4 agent-level error categories with mapped service equivalents
- 3-tier escalation ladder with handoff after 2 failed self-recovery attempts
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
Parallel coding agents stop mid-task, overwrite each other’s files, or dump truncated answers—and you have no consistent escalation path.
Who is it for?
Builders orchestrating multiple agents on one codebase who need merge-conflict stops and crash reassignment without ad-hoc firefighting.
Skip if: Single-threaded one-agent sessions with no parallelism, or pure application runtime errors with no agent coordination angle.
When should I use this skill?
Multi-agent work hits crashes, context limits, file conflicts, truncation, or partial completion and you need categorized first actions and escalation.
What do I get? / Deliverables
You classify the failure, run the tier-1 playbook (retry, context shed, or conflict stop), escalate to lead reassignment or human review with a state summary when automated recovery fails twice.
- Classified error category and recommended first action
- Escalation tier decision with context summary for human handoff
- Conflict-resolution stop/resume checklist for parallel agents
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Production and day-two agent orchestration failures surface in Operate; error triage and human escalation are the canonical home for damage control. AGENT_CRASH, CONTEXT_OVERFLOW, and PARTIAL_FAILURE are runtime error patterns—aligned with the errors subphase rather than launch or growth.
Where it fits
Two agents implement features on one branch and hit MERGE_CONFLICT—stop, resolve, resume per the skill.
Post-parallel PR generation, triage partial failures and reassign tasks to a replacement agent.
Agent heartbeat missing; reassign work and escalate to human after two failed self-recovery attempts.
How it compares
Use for agent-team incident response—not as a substitute for application error monitoring or Sentry-style stack traces.
Common Questions / FAQ
Who is error-patterns for?
Solo builders and leads running multi-agent Claude or Cursor workflows who need structured recovery when agents conflict, crash, or lose context.
When should I use error-patterns?
During Build when parallel agents edit the same repo; during Ship when merge conflicts appear after automated review branches; during Operate when agents hang, truncate, or partially complete coordinated tasks.
Is error-patterns safe to install?
It is guidance-only; review the Security Audits panel on this page and ensure escalation steps do not auto-run destructive git commands without your approval.
SKILL.md
READMESKILL.md - Error Patterns
# Agent Damage Control Agent-level error recovery patterns for multi-agent coordination. Maps service-level error categories to agent-level equivalents and defines escalation ladders. ## Error Categories Agent-level error categories extend the service-level taxonomy from error-patterns: | Agent Category | Service Equivalent | Severity | Recovery | |----------------|-------------------|----------|----------| | **AGENT_CRASH** | PERMANENT | Critical | Replace agent, reassign tasks | | **CONTEXT_OVERFLOW** | RESOURCE | Error | Graceful handoff, context shed | | **MERGE_CONFLICT** | TRANSIENT | Warning | Stop, resolve, resume | | **PARTIAL_FAILURE** | Mixed | Varies | Triage by sub-category | ## Escalation Ladder Three-tier escalation with clear handoff points: ``` Tier 1: Agent Self-Recovery Agent detects issue, attempts automated recovery (retry, context shed, conflict resolution) | | Fails after 2 attempts v Tier 2: Lead Agent Intervention Lead reassigns tasks, spawns replacement agents, coordinates resolution across team | | Cannot resolve or CRITICAL severity v Tier 3: Human Escalation Human notified with full context, recommended actions, and current state summary ``` ## Quick Reference | Scenario | First Action | |----------|--------------| | Agent stops responding | Check heartbeat, reassign tasks | | Response truncation | Summarize state, create continuation | | File conflicts after parallel work | Stop agents, lead resolves | | Some tasks fail, others succeed | Triage by error category | ## Recovery Patterns ### Agent Crash Recovery - Detect orphaned tasks via heartbeat monitoring - Apply "replace don't wait" doctrine: spawn new agent immediately - Recover state from last checkpoint or committed work - Reassign orphaned tasks to replacement agent ### Context Overflow Handling - Detection signals: truncated responses, repeated content, loss of coherence - Graceful handoff: summarize state, write to file, spawn continuation - Progressive context shedding: drop least-relevant loaded modules first ### Merge Conflict Resolution - Stop all agents working on conflicting files - Lead agent resolves conflicts using diff analysis - Resume agents with updated base after resolution - Prevention: assign non-overlapping file scopes ### Partial Failure Handling - Triage: categorize each sub-task result (success/failure/partial) - Salvage: commit successful work first - Retry: attempt failed tasks with fresh context - Report: document what succeeded and what needs manual attention ## Integration Reference specific recovery patterns from orchestrator skills: ```markdown On agent crash: follow leyline:error-patterns/modules/agent-damage-control.md ``` ## Exit Criteria - Failed agent identified and replaced (or escalated) - Orphaned tasks reassigned to healthy agents - Successful work preserved and committed - Recovery actions logged for post-mortem --- name: classification description: Error classification taxonomy and detection patterns estimated_tokens: 350 --- # Error Classification ## HTTP Status Code Mapping ```python ERROR_CLASSIFICATION = { # Authentication 401: ErrorCategory.CONFIGURATION, 403: ErrorCategory.CONFIGURATION, # Client errors 400: ErrorCategory.PERMANENT, 404: ErrorCategory.CONFIGURATION, 422: ErrorCategory.PERMANENT, # Rate limits 429: ErrorCategory.RESOURCE, # Server errors (transient) 500: ErrorCategory.TRANSIENT, 502: ErrorCategory.TRANSIENT, 503: ErrorCategory.TRANSIENT, 504: ErrorCategory.TRANSIENT, } ``` ## Error Detection Patterns ### By Message Content ```python def classify_by_message(error_msg: str) -> ErrorCategory: error_msg = error_msg.lower() if any(k in error_msg for k in ["rate limit", "quota", "exceeded"]): return ErrorCategory.RESOURCE if any(k in error_msg for k in ["auth", "token", "credential", "permission"]): return ErrorCat