Agent Architecture Audit

Name: Agent Architecture Audit
Author: affaan-m

affaan-m/everything-claude-code

Run a structured 12-layer audit before you ship an agent or any LLM feature that uses tools, memory, or multi-step loops.

Overview

Agent Architecture Audit is an agent skill most often used in Ship (also Operate) that audits the 12-layer agent stack for wrapper regression, memory pollution, tool failures, and hidden repair loops with severity-ranked

Install

npx skills add https://github.com/affaan-m/everything-claude-code --skill agent-architecture-audit

What is this skill?

12-layer agent stack diagnostic with severity-ranked findings
Detects wrapper regression, stale memory, and hidden repair/retry loops
Code-first fix recommendations for tool calling and multi-step workflows
Mandatory gate before releasing agent or LLM-powered apps to production
Contrasts with general debugging via agent-introspection-debugging when root cause is architectural
12-layer agent stack audit

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.2k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your agent works in a demo but degrades in production, and you cannot tell whether wrappers, memory, tools, or silent retries are corrupting behavior.

Who is it for?

Indie builders releasing tool-using or memory-backed agents who need a pre-production architectural pass instead of endless prompt tweaking.

Skip if: General application debugging without an agent stack focus (use agent-introspection-debugging) or routine static code review without LLM workflow concerns.

When should I use this skill?

Releasing agent or LLM apps to production; shipping tool calling, memory, or multi-step workflows; agent degrades after wrapper layers; debugging agent behavior exceeds ~15 minutes without root cause.

What do I get? / Deliverables

You get a prioritized architectural findings report with concrete code fixes so you can ship or iterate without masking failures behind wrapper layers.

Severity-ranked architectural findings
Code-first remediation guidance

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Production release of agent apps is a Ship concern; this skill is shelved under security because it targets wrapper regression, hidden retries, and failure modes that become production incidents. Security subphase fits pre-release hardening and architectural failure modes (tool discipline, memory pollution) rather than generic unit testing.

Also useful

OperateError tracking

Also useful

ShipCode review

Where it fits

Example use

ShipSecurity

Run the 12-layer audit the week before turning on a paid tier for your coding agent.

Example use

ShipCode review

Compare findings across two agent variants that share tools but use different memory schemas.

Example use

OperateError tracking

Investigate user reports that tools are flaky after a deploy that only changed prompt wrappers.

Example use

BuildAgent skills & templates

Validate a new tool-definition layer before merging the branch that adds five new MCP tools.

How it compares

Use instead of ad-hoc prompt tweaks when symptoms point to architecture (memory, tools, wrappers), not a single bad function.

Common Questions / FAQ

Who is agent-architecture-audit for?

Solo and indie developers building agent applications, autonomous loops, or LLM features with tools, memory, or multi-step workflows who are close to production.

When should I use agent-architecture-audit?

Before releasing an agent app, when adding wrapper or memory layers, when playground behavior diverges from your wrapper, or when debugging agent issues stalls without a root cause—in Ship for release gates and in Operate when production behavior regresses.

Is agent-architecture-audit safe to install?

Review the Security Audits panel on this skill’s Prism page and treat Read, Write, Edit, Bash, Grep, and Glob as full-repo access during the audit run.

SKILL.md

READMESKILL.md - Agent Architecture Audit

# Agent Architecture Audit

A diagnostic workflow for agent systems that hide failures behind wrapper layers, stale memory, retry loops, or transport/rendering mutations.

## When to Activate

**MANDATORY for:**
- Releasing any agent or LLM-powered application to production
- Shipping features with tool calling, memory, or multi-step workflows
- Agent behavior degrades after adding wrapper layers
- User reports "the agent is getting worse" or "tools are flaky"
- Same model works in playground but breaks inside your wrapper
- Debugging agent behavior for more than 15 minutes without finding root cause

**Especially critical when:**
- You've added new prompt layers, tool definitions, or memory systems
- Different agents in your system behave inconsistently
- The model was fine yesterday but is hallucinating today
- You suspect hidden repair/retry loops silently mutating responses

**Do not use for:**
- General code debugging — use `agent-introspection-debugging`
- Code review — use language-specific reviewer agents
- Security scanning — use `security-review` or `security-review/scan`
- Agent performance benchmarking — use `agent-eval`
- Writing new features — use the appropriate workflow skill

## The 12-Layer Stack

Every agent system has these layers. Any of them can corrupt the answer:

| # | Layer | What Goes Wrong |
|---|-------|----------------|
| 1 | System prompt | Conflicting instructions, instruction bloat |
| 2 | Session history | Stale context injection from previous turns |
| 3 | Long-term memory | Pollution across sessions, old topics in new conversations |
| 4 | Distillation | Compressed artifacts re-entering as pseudo-facts |
| 5 | Active recall | Redundant re-summary layers wasting context |
| 6 | Tool selection | Wrong tool routing, model skips required tools |
| 7 | Tool execution | Hallucinated execution — claims to call but doesn't |
| 8 | Tool interpretation | Misread or ignored tool output |
| 9 | Answer shaping | Format corruption in final response |
| 10 | Platform rendering | Transport-layer mutation (UI, API, CLI mutates valid answers) |
| 11 | Hidden repair loops | Silent fallback/retry agents running second LLM pass |
| 12 | Persistence | Expired state or cached artifacts reused as live evidence |

## Common Failure Patterns

### 1. Wrapper Regression

The base model produces correct answers, but the wrapper layers make it worse.

**Symptoms:**
- Model works fine in playground or direct API call, breaks in your agent
- Added a new prompt layer, existing behavior degraded
- Agent sounds confident but is confidently wrong
- "It was working before the last update"

### 2. Memory Contamination

Old topics leak into new conversations through history, memory retrieval, or distillation.

**Symptoms:**
- Agent brings up unrelated past topics
- User corrections don't stick (old memory overwrites new)
- Same-session artifacts re-enter as pseudo-facts
- Memory grows without bound, degrading response quality over time

### 3. Tool Discipline Failure

Tools are declared in the prompt but not enforced in code. The model skips them or hallucinates execution.

**Symptoms:**
- "Must use tool X" in prompt, but model answers without calling it
- Tool results look correct but were never actually executed
- Different tools fight over the same responsibility
- Model uses tool when it shouldn't, or skips it when it must

### 4. Rendering/Transport Corruption

The agent's internal answer is correct, but the platform layer mutates it during delivery.

**

What is this skill?

12-layer agent stack diagnostic with severity-ranked findings

Detects wrapper regression, stale memory, and hidden repair/retry loops

Code-first fix recommendations for tool calling and multi-step workflows

Mandatory gate before releasing agent or LLM-powered apps to production

Contrasts with general debugging via agent-introspection-debugging when root cause is architectural

12-layer agent stack audit

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.2k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Who is it for?

Indie builders releasing tool-using or memory-backed agents who need a pre-production architectural pass instead of endless prompt tweaking.

Skip if: General application debugging without an agent stack focus (use agent-introspection-debugging) or routine static code review without LLM workflow concerns.

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

OperateError tracking

Also useful

ShipCode review

Where it fits

Example use

ShipSecurity

Run the 12-layer audit the week before turning on a paid tier for your coding agent.

Example use

ShipCode review

Compare findings across two agent variants that share tools but use different memory schemas.

Example use

OperateError tracking

Investigate user reports that tools are flaky after a deploy that only changed prompt wrappers.

Example use

BuildAgent skills & templates

Validate a new tool-definition layer before merging the branch that adds five new MCP tools.

SKILL.md

READMESKILL.md - Agent Architecture Audit

# Agent Architecture Audit

A diagnostic workflow for agent systems that hide failures behind wrapper layers, stale memory, retry loops, or transport/rendering mutations.

## When to Activate

**MANDATORY for:**
- Releasing any agent or LLM-powered application to production
- Shipping features with tool calling, memory, or multi-step workflows
- Agent behavior degrades after adding wrapper layers
- User reports "the agent is getting worse" or "tools are flaky"
- Same model works in playground but breaks inside your wrapper
- Debugging agent behavior for more than 15 minutes without finding root cause

**Especially critical when:**
- You've added new prompt layers, tool definitions, or memory systems
- Different agents in your system behave inconsistently
- The model was fine yesterday but is hallucinating today
- You suspect hidden repair/retry loops silently mutating responses

**Do not use for:**
- General code debugging — use `agent-introspection-debugging`
- Code review — use language-specific reviewer agents
- Security scanning — use `security-review` or `security-review/scan`
- Agent performance benchmarking — use `agent-eval`
- Writing new features — use the appropriate workflow skill

## The 12-Layer Stack

Every agent system has these layers. Any of them can corrupt the answer:

| # | Layer | What Goes Wrong |
|---|-------|----------------|
| 1 | System prompt | Conflicting instructions, instruction bloat |
| 2 | Session history | Stale context injection from previous turns |
| 3 | Long-term memory | Pollution across sessions, old topics in new conversations |
| 4 | Distillation | Compressed artifacts re-entering as pseudo-facts |
| 5 | Active recall | Redundant re-summary layers wasting context |
| 6 | Tool selection | Wrong tool routing, model skips required tools |
| 7 | Tool execution | Hallucinated execution — claims to call but doesn't |
| 8 | Tool interpretation | Misread or ignored tool output |
| 9 | Answer shaping | Format corruption in final response |
| 10 | Platform rendering | Transport-layer mutation (UI, API, CLI mutates valid answers) |
| 11 | Hidden repair loops | Silent fallback/retry agents running second LLM pass |
| 12 | Persistence | Expired state or cached artifacts reused as live evidence |

## Common Failure Patterns

### 1. Wrapper Regression

The base model produces correct answers, but the wrapper layers make it worse.

**Symptoms:**
- Model works fine in playground or direct API call, breaks in your agent
- Added a new prompt layer, existing behavior degraded
- Agent sounds confident but is confidently wrong
- "It was working before the last update"

### 2. Memory Contamination

Old topics leak into new conversations through history, memory retrieval, or distillation.

**Symptoms:**
- Agent brings up unrelated past topics
- User corrections don't stick (old memory overwrites new)
- Same-session artifacts re-enter as pseudo-facts
- Memory grows without bound, degrading response quality over time

### 3. Tool Discipline Failure

Tools are declared in the prompt but not enforced in code. The model skips them or hallucinates execution.

**Symptoms:**
- "Must use tool X" in prompt, but model answers without calling it
- Tool results look correct but were never actually executed
- Different tools fight over the same responsibility
- Model uses tool when it shouldn't, or skips it when it must

### 4. Rendering/Transport Corruption

The agent's internal answer is correct, but the platform layer mutates it during delivery.

**

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is agent-architecture-audit for?

When should I use agent-architecture-audit?

Is agent-architecture-audit safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is agent-architecture-audit for?

When should I use agent-architecture-audit?

Is agent-architecture-audit safe to install?

SKILL.md