Error Patterns

Production and day-two agent orchestration failures surface in Operate; error triage and human escalation are the canonical home for damage control. AGENT_CRASH, CONTEXT_OVERFLOW, and PARTIAL_FAILURE are runtime error patterns—aligned with the errors subphase rather than launch or growth.

Also useful

Also useful

Where it fits

Example use

Two agents implement features on one branch and hit MERGE_CONFLICT—stop, resolve, resume per the skill.

Example use

Post-parallel PR generation, triage partial failures and reassign tasks to a replacement agent.

Example use

Agent heartbeat missing; reassign work and escalate to human after two failed self-recovery attempts.

How it compares

Use for agent-team incident response—not as a substitute for application error monitoring or Sentry-style stack traces.

Common Questions / FAQ

Who is error-patterns for?

Solo builders and leads running multi-agent Claude or Cursor workflows who need structured recovery when agents conflict, crash, or lose context.

When should I use error-patterns?

During Build when parallel agents edit the same repo; during Ship when merge conflicts appear after automated review branches; during Operate when agents hang, truncate, or partially complete coordinated tasks.

Is error-patterns safe to install?

It is guidance-only; review the Security Audits panel on this page and ensure escalation steps do not auto-run destructive git commands without your approval.

SKILL.md

READMESKILL.md - Error Patterns

# Agent Damage Control

Agent-level error recovery patterns for multi-agent coordination. Maps service-level error categories to agent-level equivalents and defines escalation ladders.

## Error Categories

Agent-level error categories extend the service-level taxonomy from error-patterns:

| Agent Category | Service Equivalent | Severity | Recovery |
|----------------|-------------------|----------|----------|
| **AGENT_CRASH** | PERMANENT | Critical | Replace agent, reassign tasks |
| **CONTEXT_OVERFLOW** | RESOURCE | Error | Graceful handoff, context shed |
| **MERGE_CONFLICT** | TRANSIENT | Warning | Stop, resolve, resume |
| **PARTIAL_FAILURE** | Mixed | Varies | Triage by sub-category |

## Escalation Ladder

Three-tier escalation with clear handoff points:

```
Tier 1: Agent Self-Recovery
  Agent detects issue, attempts automated recovery
  (retry, context shed, conflict resolution)
        |
        | Fails after 2 attempts
        v
Tier 2: Lead Agent Intervention
  Lead reassigns tasks, spawns replacement agents,
  coordinates resolution across team
        |
        | Cannot resolve or CRITICAL severity
        v
Tier 3: Human Escalation
  Human notified with full context, recommended actions,
  and current state summary
```

## Quick Reference

| Scenario | First Action |
|----------|--------------|
| Agent stops responding | Check heartbeat, reassign tasks |
| Response truncation | Summarize state, create continuation |
| File conflicts after parallel work | Stop agents, lead resolves |
| Some tasks fail, others succeed | Triage by error category |

## Recovery Patterns

### Agent Crash Recovery
- Detect orphaned tasks via heartbeat monitoring
- Apply "replace don't wait" doctrine: spawn new agent immediately
- Recover state from last checkpoint or committed work
- Reassign orphaned tasks to replacement agent

### Context Overflow Handling
- Detection signals: truncated responses, repeated content, loss of coherence
- Graceful handoff: summarize state, write to file, spawn continuation
- Progressive context shedding: drop least-relevant loaded modules first

### Merge Conflict Resolution
- Stop all agents working on conflicting files
- Lead agent resolves conflicts using diff analysis
- Resume agents with updated base after resolution
- Prevention: assign non-overlapping file scopes

### Partial Failure Handling
- Triage: categorize each sub-task result (success/failure/partial)
- Salvage: commit successful work first
- Retry: attempt failed tasks with fresh context
- Report: document what succeeded and what needs manual attention

## Integration

Reference specific recovery patterns from orchestrator skills:
```markdown
On agent crash: follow leyline:error-patterns/modules/agent-damage-control.md
```

## Exit Criteria

- Failed agent identified and replaced (or escalated)
- Orphaned tasks reassigned to healthy agents
- Successful work preserved and committed
- Recovery actions logged for post-mortem


---
name: classification
description: Error classification taxonomy and detection patterns
estimated_tokens: 350
---

# Error Classification

## HTTP Status Code Mapping

```python
ERROR_CLASSIFICATION = {
    # Authentication
    401: ErrorCategory.CONFIGURATION,
    403: ErrorCategory.CONFIGURATION,

    # Client errors
    400: ErrorCategory.PERMANENT,
    404: ErrorCategory.CONFIGURATION,
    422: ErrorCategory.PERMANENT,

    # Rate limits
    429: ErrorCategory.RESOURCE,

    # Server errors (transient)
    500: ErrorCategory.TRANSIENT,
    502: ErrorCategory.TRANSIENT,
    503: ErrorCategory.TRANSIENT,
    504: ErrorCategory.TRANSIENT,
}
```

## Error Detection Patterns

### By Message Content
```python
def classify_by_message(error_msg: str) -> ErrorCategory:
    error_msg = error_msg.lower()

    if any(k in error_msg for k in ["rate limit", "quota", "exceeded"]):
        return ErrorCategory.RESOURCE

    if any(k in error_msg for k in ["auth", "token", "credential", "permission"]):
        return ErrorCat

What is this skill?

Four agent error categories mapped from service-level taxonomy (crash, context overflow, merge conflict, partial failure

Three-tier escalation: self-recovery → lead agent → human with full context

Merge-conflict playbook: stop parallel work, resolve, then resume

Heartbeat and reassignment guidance when an agent stops responding

Truncation handling via summarize state and continuation handoff

4 agent-level error categories with mapped service equivalents

3-tier escalation ladder with handoff after 2 failed self-recovery attempts

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

You classify the failure, run the tier-1 playbook (retry, context shed, or conflict stop), escalate to lead reassignment or human review with a state summary when automated recovery fails twice.

Classified error category and recommended first action

Escalation tier decision with context summary for human handoff

Conflict-resolution stop/resume checklist for parallel agents

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Two agents implement features on one branch and hit MERGE_CONFLICT—stop, resolve, resume per the skill.

Example use

Post-parallel PR generation, triage partial failures and reassign tasks to a replacement agent.

Example use