Adversarial Resilience

Name: Adversarial Resilience
Author: itallstartedwithaidea

itallstartedwithaidea/agent-skills

Harden coding agents that read untrusted data—ads copy, docs, API responses—against prompt injection and capability escalation before production.

Overview

adversarial-resilience is an agent skill most often used in Ship (also Build agent-tooling, Operate monitoring) that hardens AI agents against prompt injection, exfiltration, and capability escalation.

Install

npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill adversarial-resilience

What is this skill?

Layered defense model: input sanitization, instruction boundaries, and capability limits
Covers prompt injection, data exfiltration, sandbox escape, and unauthorized escalation
Patterns from production agent work with untrusted user-provided business data
Applies to any agent processing external documents, APIs, or user content
Frames every agent as handling untrusted input in practice

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

Your agent can call tools and touch sensitive systems, but untrusted input in ads data, files, or APIs could redirect or abuse those powers.

Who is it for?

Builders shipping tool-using agents that ingest user or third-party content and need a structured adversarial threat model.

Skip if: Static sites with no agents, or teams that only need conventional OWASP web-app hardening without LLM-specific threats.

When should I use this skill?

Building or shipping an agent that processes untrusted user, document, or API content and exposes tools, filesystem, or execution.

What do I get? / Deliverables

You apply layered defensive patterns so agent instructions and capabilities resist deliberate manipulation before and after you ship.

Layered defense checklist applied to the agent
Input-handling and capability-boundary recommendations
Threat-model notes for injection and exfiltration paths

Recommended Skills

Azure Compliancemicrosoft/azure-skills

Azure Compliance is a Microsoft agent skill for solo builders and small teams who need structured security and complianc…373k installs·1.2k stars

Openclaw Secure Linux Cloudxixu-me/skills

OpenClaw Secure Linux Cloud guides a hardened VPS deployment—rootless Podman, loopback-only gateway, SSH-tunneled Contro…201k installs·61 stars

Entra Agent Idmicrosoft/azure-skills

entra-agent-id is a Microsoft-authored agent skill for provisioning and operating OAuth-capable identities tailored to A…99.1k installs·1.2k stars

Firebase Security Rules Auditorfirebase/agent-skills

Firebase Security Rules Auditor turns your agent into a senior penetration-style reviewer for Firestore rulesets. Solo b…40.3k installs·345 stars

Firestore Security Rules Auditorfirebase/agent-skills

Firestore-security-rules-auditor is a checker skill for solo builders and small teams shipping Firebase-backed apps who …20.3k installs·345 stars

Skill Vetteruseai-pro/openclaw-skills-security

Skill Vetter is a security-first OpenClaw agent skill that makes your coding agent behave like a pre-install auditor for…19.2k installs·62 stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Security hardening is canonically shelved under ship when you are validating defenses before release, even though patterns also apply while building agents. The skill centers on adversarial threats and layered defenses (injection, exfiltration, sandbox escape), matching ship → security rather than generic integrations.

Also useful

BuildAgent skills & templates

Also useful

OperateMonitoring & observability

Where it fits

Example use

BuildAgent skills & templates

Design tool permissions and input boundaries before wiring Google Ads or CRM data into the agent.

Example use

ShipSecurity

Run an adversarial review pass before production launch when the agent can write files or execute shell commands.

Example use

OperateMonitoring & observability

Revisit defenses when new untrusted document types or API partners are added to a live agent.

Example use

ValidateScope & plan

Decide which agent capabilities are in scope for an MVP so attack surface stays bounded.

How it compares

Use alongside generic secure-code review; this skill targets LLM instruction manipulation and agent capability abuse, not routine XSS-only checklists.

Common Questions / FAQ

Who is adversarial-resilience for?

Solo builders and small teams creating AI agents with MCP tools, APIs, or code execution who must treat all external text as potentially hostile.

When should I use adversarial-resilience?

In ship when pre-launch threat modeling; in build when adding new agent tools or data sources; in operate when investigating suspicious agent behavior or expanding production data feeds.

Is adversarial-resilience safe to install?

The skill is defensive guidance; it does not grant permissions by itself—review the Security Audits panel on this page and your agent’s actual tool policies before deployment.

SKILL.md

READMESKILL.md - Adversarial Resilience

# Adversarial Resilience

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Adversarial Resilience is the practice of hardening AI agents against deliberate attacks — prompt injection, data exfiltration, sandbox escape, and unauthorized capability escalation. As agents gain more access to tools, APIs, file systems, and code execution environments, the attack surface grows proportionally. An agent that can write files and execute shell commands is a powerful ally but also a potent attack vector if its instructions can be manipulated by adversarial input.

This skill addresses the security challenges encountered in building the Buddy™ agent at [googleadsagent.ai™](https://googleadsagent.ai), where user-provided Google Ads data could theoretically contain injection payloads embedded in campaign names, ad copy, or keyword lists. The defensive patterns here apply to any agent that processes untrusted input — which, in practice, is every agent. Even "internal" tools are vulnerable to indirect injection through documents, code comments, and API responses that contain adversarial content.

The defense model operates in layers: input sanitization strips known attack patterns before they reach the model, instruction anchoring makes the system prompt resistant to override, output filtering prevents sensitive data from leaking through responses, permission boundaries restrict what the agent can do regardless of what it is told to do, and audit logging creates a forensic trail of every action for post-incident analysis.

## Use When

- Agents process any form of user-provided or external input
- The agent has access to sensitive tools (file write, shell execute, API calls)
- Deployed agents are accessible to users outside your trusted organization
- Compliance requirements mandate security controls for AI systems
- You need to protect against both deliberate attacks and accidental injection
- The agent operates in a multi-tenant environment with data isolation requirements

## How It Works

```mermaid
graph TD
    A[Untrusted Input] --> B[Layer 1: Input Sanitizer]
    B --> C[Layer 2: Instruction Anchor]
    C --> D[Agent Core Processing]
    D --> E[Layer 3: Output Filter]
    E --> F[Layer 4: Permission Boundary]
    F --> G{Action Allowed?}
    G -->|Yes| H[Execute Action]
    G -->|No| I[Block + Alert]
    H --> J[Layer 5: Audit Logger]
    I --> J
    J --> K[Response to User]
    
    L[Secret Scanner] --> B
    L --> E
    M[Pattern Database] --> B
```

The five-layer defense model ensures that adversarial content is intercepted at the earliest possible point. Input sanitization detects and neutralizes known injection patterns before they enter the model's context. Instruction anchoring uses structural techniques (XML delimiters, priority declarations, identity reinforcement) to make the system prompt resistant to override. Output filtering scans generated responses for sensitive data leakage. Permission boundaries enforce hard limits on actions regardless of model intent. Audit logging provides complete forensic visibility into every input, decision, and action.

## Implementation

**Input Sanitization Engine:**

```python
import re

class InputSanitizer:
    INJECTION_PATTERNS = [
        r"ignore\s+(all\s+)?(previous|prior|above)\s+(instructions|prompts|rules)",
        r"you\s+are\s+now\s+(?:a|an)\s+",
        r"system\s*:\s*",
        r"<\s*/?system\s*>",
        r"BEGIN\s+SYSTEM\s+PROMPT",
        r"(?:ADMIN|ROOT|SUDO)\s*(?:MODE|ACCESS|OVERRIDE)",
        r"\[INST\]",
        r"<\|(?:im_start|im_end|endoftext)\|>",
    ]

    def __init__(self):
        self.compiled = [re.compile(p, re.IGNORECAS

What is this skill?

Layered defense model: input sanitization, instruction boundaries, and capability limits

Covers prompt injection, data exfiltration, sandbox escape, and unauthorized escalation

Patterns from production agent work with untrusted user-provided business data

Applies to any agent processing external documents, APIs, or user content

Frames every agent as handling untrusted input in practice

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildAgent skills & templates

Also useful

OperateMonitoring & observability

Where it fits

Example use

BuildAgent skills & templates

Design tool permissions and input boundaries before wiring Google Ads or CRM data into the agent.

Example use

ShipSecurity

Run an adversarial review pass before production launch when the agent can write files or execute shell commands.

Example use

OperateMonitoring & observability

Revisit defenses when new untrusted document types or API partners are added to a live agent.

Example use

ValidateScope & plan

Decide which agent capabilities are in scope for an MVP so attack surface stays bounded.

SKILL.md

READMESKILL.md - Adversarial Resilience

# Adversarial Resilience

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Adversarial Resilience is the practice of hardening AI agents against deliberate attacks — prompt injection, data exfiltration, sandbox escape, and unauthorized capability escalation. As agents gain more access to tools, APIs, file systems, and code execution environments, the attack surface grows proportionally. An agent that can write files and execute shell commands is a powerful ally but also a potent attack vector if its instructions can be manipulated by adversarial input.

This skill addresses the security challenges encountered in building the Buddy™ agent at [googleadsagent.ai™](https://googleadsagent.ai), where user-provided Google Ads data could theoretically contain injection payloads embedded in campaign names, ad copy, or keyword lists. The defensive patterns here apply to any agent that processes untrusted input — which, in practice, is every agent. Even "internal" tools are vulnerable to indirect injection through documents, code comments, and API responses that contain adversarial content.

The defense model operates in layers: input sanitization strips known attack patterns before they reach the model, instruction anchoring makes the system prompt resistant to override, output filtering prevents sensitive data from leaking through responses, permission boundaries restrict what the agent can do regardless of what it is told to do, and audit logging creates a forensic trail of every action for post-incident analysis.

## Use When

- Agents process any form of user-provided or external input
- The agent has access to sensitive tools (file write, shell execute, API calls)
- Deployed agents are accessible to users outside your trusted organization
- Compliance requirements mandate security controls for AI systems
- You need to protect against both deliberate attacks and accidental injection
- The agent operates in a multi-tenant environment with data isolation requirements

## How It Works

```mermaid
graph TD
    A[Untrusted Input] --> B[Layer 1: Input Sanitizer]
    B --> C[Layer 2: Instruction Anchor]
    C --> D[Agent Core Processing]
    D --> E[Layer 3: Output Filter]
    E --> F[Layer 4: Permission Boundary]
    F --> G{Action Allowed?}
    G -->|Yes| H[Execute Action]
    G -->|No| I[Block + Alert]
    H --> J[Layer 5: Audit Logger]
    I --> J
    J --> K[Response to User]
    
    L[Secret Scanner] --> B
    L --> E
    M[Pattern Database] --> B
```

The five-layer defense model ensures that adversarial content is intercepted at the earliest possible point. Input sanitization detects and neutralizes known injection patterns before they enter the model's context. Instruction anchoring uses structural techniques (XML delimiters, priority declarations, identity reinforcement) to make the system prompt resistant to override. Output filtering scans generated responses for sensitive data leakage. Permission boundaries enforce hard limits on actions regardless of model intent. Audit logging provides complete forensic visibility into every input, decision, and action.

## Implementation

**Input Sanitization Engine:**

```python
import re

class InputSanitizer:
    INJECTION_PATTERNS = [
        r"ignore\s+(all\s+)?(previous|prior|above)\s+(instructions|prompts|rules)",
        r"you\s+are\s+now\s+(?:a|an)\s+",
        r"system\s*:\s*",
        r"<\s*/?system\s*>",
        r"BEGIN\s+SYSTEM\s+PROMPT",
        r"(?:ADMIN|ROOT|SUDO)\s*(?:MODE|ACCESS|OVERRIDE)",
        r"\[INST\]",
        r"<\|(?:im_start|im_end|endoftext)\|>",
    ]

    def __init__(self):
        self.compiled = [re.compile(p, re.IGNORECAS

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is adversarial-resilience for?

When should I use adversarial-resilience?

Is adversarial-resilience safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is adversarial-resilience for?

When should I use adversarial-resilience?

Is adversarial-resilience safe to install?

SKILL.md