
Ai Security
Map LLM and agent risks to MITRE ATLAS techniques and apply detection signatures for injection, jailbreak, tool abuse, and training-data threats.
Overview
AI Security is an agent skill most often used in Ship (also Operate, Build) that maps MITRE ATLAS adversarial techniques to detection signatures and risk scoring for LLM and agent systems.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill ai-securityWhat is this skill?
- MITRE ATLAS technique coverage matrix with tactic and detection method columns
- Signatures for direct and indirect LLM prompt injection (AML.T0051, AML.T0051.001)
- Agent tool abuse via injection detection (AML.T0051.002)
- Jailbreak persona and system-prompt extraction pattern detection
- Training data poisoning markers and model inversion risk scoring
- MITRE ATLAS technique matrix with 8+ documented techniques and per-technique detection methods
- Coverage includes AML.T0051 prompt injection family, AML.T0054 jailbreak, AML.T0056 data extraction, AML.T0020 poison tr
Adoption & trust: 511 installs on skills.sh; 17.5k GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You are shipping an AI agent but lack a shared threat model for prompt injection, tool abuse, and data exfiltration beyond generic web app security.
Who is it for?
Indie builders launching RAG chatbots, tool-calling agents, or fine-tuned models who need ATLAS-grounded security review artifacts.
Skip if: Static sites with no LLM inference, or teams expecting turnkey SOC integration without implementing detection themselves.
When should I use this skill?
Designing, reviewing, or monitoring LLM agents and need MITRE ATLAS-aligned threat coverage and detection patterns.
What do I get? / Deliverables
You get ATLAS-aligned coverage notes and concrete detection patterns so reviews and monitoring target named techniques instead of vague “prompt safety.”
- ATLAS technique coverage map for your agent architecture
- Detection signature checklist tied to named AML techniques
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Ship/security is the canonical shelf because the skill focuses on adversarial ML threats before and during production exposure of AI features. Security subphase matches ATLAS-aligned threat modeling, injection detection, and exfiltration controls for shipped agent products.
Where it fits
Run an ATLAS-aligned review of tool-calling paths before exposing the agent to paying users.
Align log alerts with injection and exfiltration signatures when production traffic hits the inference API.
Document tool invocation guardrails while designing which external actions the agent may trigger.
How it compares
Threat-and-detection playbook for AI systems, not a generic OWASP web scanner or MCP server.
Common Questions / FAQ
Who is ai-security for?
Solo and small-team developers shipping LLM-powered products who want MITRE ATLAS vocabulary and starter detection mappings without hiring a dedicated AI red team first.
When should I use ai-security?
In Ship/security before launch to threat-model agents; in Operate/monitoring when tuning logs for injection and exfiltration; in Build/agent-tooling when designing tool gates and retrieval boundaries.
Is ai-security safe to install?
Treat it as guidance and detection recipes—review Security Audits on this Prism page and validate any scanning or regex rules in a staging environment before production.
SKILL.md
READMESKILL.md - Ai Security
# MITRE ATLAS Technique Coverage Reference table for MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) techniques covered by the ai-security skill. ATLAS is the AI/ML equivalent of MITRE ATT&CK. Source: https://atlas.mitre.org/ --- ## Technique Coverage Matrix | ATLAS ID | Technique Name | Tactic | Covered by ai-security | Detection Method | |---------|---------------|--------|------------------------|-----------------| | AML.T0051 | LLM Prompt Injection | ML Attack Staging | Yes — direct_role_override, indirect_injection signatures | Injection signature regex matching | | AML.T0051.001 | Indirect Prompt Injection via Retrieved Content | ML Attack Staging | Yes — indirect_injection signature | Template token detection, external content validation | | AML.T0051.002 | Agent Tool Abuse via Injection | Execution | Yes — tool_abuse signature | Tool invocation pattern detection | | AML.T0054 | LLM Jailbreak | ML Attack Staging | Yes — jailbreak_persona signature | Persona framing pattern detection | | AML.T0056 | LLM Data Extraction | Exfiltration | Yes — system_prompt_extraction signature | System prompt exfiltration pattern detection | | AML.T0020 | Poison Training Data | Persistence | Yes — data_poisoning_marker signature + risk scoring | Training data marker detection; fine-tuning scope risk score | | AML.T0024 | Exfiltration via ML Inference API | Exfiltration | Yes — model inversion risk scoring | Access level-based risk scoring | | AML.T0043 | Craft Adversarial Data | Defense Evasion | Partial — adversarial robustness risk scoring | Target-type based risk scoring; requires dedicated adversarial testing for confirmation | | AML.T0005 | Create Proxy ML Model | Resource Development | Not covered — requires model stealing detection | Monitor for high-volume systematic querying | | AML.T0016 | Acquire Public ML Artifacts | Resource Development | Not covered — supply chain risk only | Verify model provenance and checksums | | AML.T0018 | Backdoor ML Model | Persistence | Partial — data_poisoning_marker + poisoning risk | Training data audit; behavioral testing for trigger inputs | | AML.T0019 | Publish Poisoned Datasets | Resource Development | Not covered — upstream supply chain only | Dataset provenance tracking | | AML.T0040 | ML Model Inference API Access | Collection | Not covered — requires API log analysis | Monitor inference API for high-volume systematic queries | | AML.T0012 | Valid Accounts — ML Service | Initial Access | Not covered — covered by cloud-security skill | IAM misconfiguration detection (delegate to cloud-security) | --- ## Technique Detail: AML.T0051 — LLM Prompt Injection **Tactic:** ML Attack Staging, Initial Access **Description:** An adversary crafts inputs designed to override the model's system prompt, hijack its instructions, or cause it to perform actions outside its defined scope. **Sub-techniques:** - AML.T0051.001 — Indirect injection via externally retrieved content (web pages, documents, email) - AML.T0051.002 — Agent tool abuse via injection (directing agent to invoke tools with malicious parameters) **Attack Examples:** - System-prompt override phrasing injected as user input to hijack model behavior - Malicious web page containing hidden context-replacement directives targeting RAG-augmented agents - Embedded tool-invocation directive in retrieved PDF: instructs agent to execute destructive actions **Defensive Controls:** 1. Input validation with injection signature scanning (ai_threat_scanner.py) 2. Semantic similarity filter against known jailbreak template library 3. Context integrity monitoring — detect mid-session role changes 4. Separate system prompt from user context — use distinct context tokens 5. Output validation — detect responses that echo system prompt content --- ## Technique Detail: AML.T0054 — LLM Jailbreak **Tactic:** ML Attack Staging **Description:** Techniques to bypass safety alignment training through persona manipulation,