
Agent Introspection Debugging
Stop blind agent retries and document why a Claude Code or Cursor run stalled, looped, or burned tokens.
Overview
Agent Introspection Debugging is an agent skill most often used in Operate (also Build, Ship) that guides agents through capture, diagnosis, contained recovery, and a readable introspection report when runs loop or stall
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill agent-introspection-debuggingWhat is this skill?
- Four-phase loop: failure capture, diagnosis, contained recovery, introspection report
- Activates on max tool-call/loop-limit failures and retry loops with no progress
- Explicit scope boundaries: not verification-loop or framework-specific ECC debug skills
- Produces human-readable debug reports before escalating to a human
- Workflow skill—no hidden runtime enforcement in the harness
- Four-phase debugging loop: capture, diagnosis, recovery, introspection report
Adoption & trust: 3.1k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent keeps retrying the same tools, hits loop limits, or drifts off-task while you have no structured record of what actually failed.
Who is it for?
Solo builders running long autonomous agent sessions where token burn and loop-limit errors need a disciplined self-debug ritual before human escalation.
Skip if: Post-implementation feature verification after code changes—use verification-loop instead—or cases where a narrower ECC framework debug skill already covers the stack.
When should I use this skill?
Agent run fails repeatedly, hits loop/tool limits, retries without progress, context grows and quality drops, or environment state mismatches expectations.
What do I get? / Deliverables
You get a contained recovery attempt plus a clear debug report so the next turn—or you—can act without another blind retry spiral.
- Structured failure capture (error type, context)
- Human-readable introspection / debug report
- Contained recovery action log
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Operate because repeated agent failures are production-adjacent incidents you must diagnose and contain before they waste more compute. Errors is the best primary facet for failure capture, loop-limit exits, and structured recovery—not generic feature work.
Where it fits
An agent loops on pagination while wiring a third-party API—capture state before rewriting the connector.
A fix-and-verify agent keeps re-reading the same diff without merging insights—diagnose drift before another review pass.
Scheduled agent maintenance job fails on filesystem expectations—record mismatch and try a smaller corrective step.
How it compares
Use instead of asking the agent to “try again” without capturing error type, tool history, and environment mismatch first.
Common Questions / FAQ
Who is agent-introspection-debugging for?
Indie and solo builders using Claude Code, Cursor, or similar agents who hit repeated tool failures, context drift, or loop limits during real shipping work.
When should I use agent-introspection-debugging?
During Build when integrations misbehave mid-task, during Ship when review-fix loops stall, and during Operate when production-adjacent agent jobs fail repeatedly—always before another blind retry.
Is agent-introspection-debugging safe to install?
It is procedural guidance only; review the Security Audits panel on this Prism page and treat recovery steps as suggestions you approve in your environment.
SKILL.md
READMESKILL.md - Agent Introspection Debugging
# Agent Introspection Debugging Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task. This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human. ## When to Activate - Maximum tool call / loop-limit failures - Repeated retries with no forward progress - Context growth or prompt drift that starts degrading output quality - File-system or environment state mismatch between expectation and reality - Tool failures that are likely recoverable with diagnosis and a smaller corrective action ## Scope Boundaries Activate this skill for: - capturing failure state before retrying blindly - diagnosing common agent-specific failure patterns - applying contained recovery actions - producing a structured human-readable debug report Do not use this skill as the primary source for: - feature verification after code changes; use `verification-loop` - framework-specific debugging when a narrower ECC skill already exists - runtime promises the current harness cannot enforce automatically ## Four-Phase Loop ### Phase 1: Failure Capture Before trying to recover, record the failure precisely. Capture: - error type, message, and stack trace when available - last meaningful tool call sequence - what the agent was trying to do - current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes - current environment assumptions: cwd, branch, relevant service state, expected files Minimum capture template: ```markdown ## Failure Capture - Session / task: - Goal in progress: - Error: - Last successful step: - Last failed tool / command: - Repeated pattern seen: - Environment assumptions to verify: ``` ### Phase 2: Root-Cause Diagnosis Match the failure to a known pattern before changing anything. | Pattern | Likely Cause | Check | | --- | --- | --- | | Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition | | Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk | | `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions | | `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing | | file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence | | tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug | Diagnosis questions: - is this a logic failure, state failure, environment failure, or policy failure? - did the agent lose the real objective and start optimizing the wrong subtask? - is the failure deterministic or transient? - what is the smallest reversible action that would validate the diagnosis? ### Phase 3: Contained Recovery Recover with the smallest action that changes the diagnosis surface. Safe recovery actions: - stop repeated retries and restate the hypothesis - trim low-signal context and keep only the active goal, blockers, and evidence - re-check the actual filesystem / branch / process state - narrow the task to one failing command, one file, or one test - switch from speculative reasoning to direct observation - escalate to a human when the failure is high-risk or externally blocked Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment. Contained recovery checklist: ```markdown ## Recovery Action - Diagn