
Obliteratus Abliteration
Run surgical abliteration on Hugging Face transformer weights to analyze and remove refusal directions while keeping core language capability.
Overview
OBLITERATUS Abliteration is an agent skill for the Build phase that guides installation and use of the OBLITERATUS toolkit to identify and surgically remove LLM refusal directions via SVD/PCA weight projection.
Install
npx skills add https://github.com/aradotso/trending-skills --skill obliteratus-abliterationWhat is this skill?
- Locates refusal directions via SVD/PCA on hidden states and projects them out of weights
- Ships Gradio UI, CLI, Python API, and Colab notebook workflows
- Optional full analysis extras via pip install obliteratus[full]
- Targets PyTorch 2.1+ with CUDA; HuggingFace token for gated models
- Open-source OBLITERATUS repo with pip and editable source install paths
- Python 3.10+
- Gradio>=5.29.0
Adoption & trust: 760 installs on skills.sh; 31 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a Hugging Face transformer and need a repeatable, local workflow to analyze refusal geometry and modify weights without rebuilding the model from scratch.
Who is it for?
Advanced solo builders running local GPU jobs who own legal and ethical responsibility for modified model weights.
Skip if: Typical indie SaaS shipping, compliance-first production chat products, or anyone expecting hosted guardrails without self-hosted ML ops.
When should I use this skill?
User asks to abliterate a model, remove refusal from an LLM, obliterate guardrails, free a language model from restrictions, run abliteration on a HuggingFace model, use OBLITERATUS, or extract refusal directions from a
What do I get? / Deliverables
You get obliteratus installed with the right optional extras, environment variables for gated models, and a clear path to run abliteration through Gradio, CLI, API, or Colab.
- Configured obliteratus install (core, spaces, or full)
- Runnable abliteration path via Gradio, CLI, API, or Colab
Recommended Skills
Journey fit
How it compares
Specialized mechanistic-interpretability weight surgery—not a prompt-jailbreak cheat sheet or a hosted moderation API.
Common Questions / FAQ
Who is obliteratus-abliteration for?
Solo builders and researchers comfortable with PyTorch, CUDA or CPU inference, and HuggingFace gated models who want OBLITERATUS workflows in the agent.
When should I use obliteratus-abliteration?
During Build agent-tooling when you need to abliterate a model, remove refusal from an LLM, run OBLITERATUS on a HuggingFace checkpoint, or analyze refusal directions before deploying a custom local model.
Is obliteratus-abliteration safe to install?
Treat it as high-risk ML tooling: review the Security Audits panel on this Prism page and verify the upstream repo before pip install or cloning source on machines that hold secrets.
SKILL.md
READMESKILL.md - Obliteratus Abliteration
# OBLITERATUS — LLM Abliteration Toolkit > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. OBLITERATUS is an open-source toolkit for identifying and surgically removing refusal behaviors from large language models using mechanistic interpretability techniques (abliteration). It locates refusal directions in a model's hidden states via SVD/PCA, projects them out of the weights, and preserves core language capabilities. Ships with a Gradio UI, CLI, Python API, and Colab notebook. --- ## Installation ```bash # Core install pip install obliteratus # With Gradio UI support pip install "obliteratus[spaces]" # With all optional analysis modules pip install "obliteratus[full]" # From source (latest) git clone https://github.com/elder-plinius/OBLITERATUS cd OBLITERATUS pip install -e ".[full]" ``` **Requirements:** - Python 3.10+ - PyTorch 2.1+ with CUDA (recommended) or CPU - `transformers`, `accelerate`, `gradio>=5.29.0` - HuggingFace account + token for gated models ```bash export HF_TOKEN=your_hf_token_here huggingface-cli login ``` --- ## CLI — Key Commands ```bash # Basic obliteration (default method) obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct # Advanced method (whitened SVD + bias projection + iterative refinement) obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method advanced # Analysis-informed pipeline (auto-configures from geometry analysis) obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct --method informed # Specify output directory and push to Hub obliteratus obliterate mistralai/Mistral-7B-Instruct-v0.3 \ --method advanced \ --output ./my-liberated-model \ --push-to-hub your-username/mistral-7b-liberated # LoRA-based reversible ablation (non-destructive) obliteratus obliterate meta-llama/Llama-3.1-8B-Instruct \ --method lora \ --lora-rank 1 # Strength sweep — find the capability/compliance tradeoff obliteratus sweep meta-llama/Llama-3.1-8B-Instruct \ --strengths 0.2,0.4,0.6,0.8,1.0 # Run analysis modules only (no modification) obliteratus analyze meta-llama/Llama-3.1-8B-Instruct \ --modules concept_cone,alignment_imprint,universality # Benchmark: compare methods on a model obliteratus benchmark meta-llama/Llama-3.1-8B-Instruct \ --methods basic,advanced,informed # Launch local Gradio UI obliteratus ui obliteratus ui --port 8080 --share obliteratus ui --no-telemetry ``` --- ## Python API ### Basic obliteration ```python from obliteratus import Obliterator # Initialize with a HuggingFace model ID or local path obl = Obliterator("meta-llama/Llama-3.1-8B-Instruct") # Run the full pipeline: SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH result = obl.obliterate(method="advanced") print(result.perplexity_delta) # capability preservation metric print(result.refusal_rate_delta) # refusal reduction print(result.output_path) # where the model was saved ``` ### Step-by-step pipeline ```python from obliteratus import Obliterator from obliteratus.pipeline import PipelineConfig config = PipelineConfig( method="advanced", num_directions=32, # number of refusal directions to extract strength=1.0, # projection strength (0.0–1.0+) preserve_norm=True, # norm-preserving biprojection project_biases=True, # also remove from bias terms iterative_passes=3, # re-probe after each pass layers="auto", # or list of ints, e.g. [10, 11, 12, 13] dtype="bfloat16", device="cuda", ) obl = Ob