
Pyvene Interventions
Wrap Hugging Face causal LMs with pyvene IntervenableModel configs so agents can run layer-level interventions and generation without bespoke forward hooks.
Overview
Pyvene Interventions is an agent skill for the Build phase that documents how to configure and run pyvene IntervenableModel interventions on transformer LMs.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill pyvene-interventionsWhat is this skill?
- Documents IntervenableModel wrapping AutoModelForCausalLM with RepresentationConfig for layer and component targets
- Covers forward passes with base and source inputs, unit_locations for position-specific interventions, and output_origin
- Includes intervenable.generate for max_new_tokens generation under intervention
- Save/load local checkpoints and HuggingFace hub publish and load paths for intervention bundles
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to patch internal representations in a Hugging Face model but lack a copy-paste-correct reference for pyvene configs, forwards, generation, and checkpoint I/O.
Who is it for?
Solo ML researchers and indie agent builders doing interpretability or causal intervention experiments on causal LMs with pyvene.
Skip if: Builders who only need chat completions, RAG, or fine-tuning without layer-level intervention code.
When should I use this skill?
Implementing or debugging pyvene IntervenableModel configs, forward passes, generation, or save/load for transformer interventions.
What do I get? / Deliverables
Your agent can scaffold IntervenableModel setups, run base/source interventions (including position-specific and generate paths), and save or load intervention artifacts.
- IntervenableConfig and working intervention forward or generate code
- Saved local or Hub intervention artifacts
Recommended Skills
Journey fit
Mechanistic interpretability and intervention experiments are built during the product/research implementation phase when you instrument models. Agent-tooling fits library API reference skills that extend how coding agents work with research stacks alongside model code.
How it compares
Reference procedural knowledge for pyvene—not a training pipeline skill or a hosted inference MCP.
Common Questions / FAQ
Who is pyvene-interventions for?
Developers and researchers who already use PyTorch transformers and want coding agents to follow correct pyvene IntervenableModel patterns.
When should I use pyvene-interventions?
During build when instrumenting models for activation patching, comparing base versus source runs, or generating text under fixed interventions.
Is pyvene-interventions safe to install?
Check the Security Audits panel on this page; the skill is documentation-heavy but your project will still download models and run local GPU code you control.
SKILL.md
READMESKILL.md - Pyvene Interventions
# pyvene API Reference ## IntervenableModel The core class that wraps PyTorch models for intervention. ### Basic Usage ```python import pyvene as pv from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("gpt2") config = pv.IntervenableConfig( representations=[ pv.RepresentationConfig( layer=5, component="block_output", intervention_type=pv.VanillaIntervention, ) ] ) intervenable = pv.IntervenableModel(config, model) ``` ### Forward Pass ```python # Basic intervention original_output, intervened_output = intervenable( base=base_inputs, sources=[source_inputs], ) # With unit locations (position-specific) _, outputs = intervenable( base=base_inputs, sources=[source_inputs], unit_locations={"sources->base": ([[[5]]], [[[5]]])}, # Position 5 ) # Return original output too original, intervened = intervenable( base=base_inputs, sources=[source_inputs], output_original_output=True, ) ``` ### Generation ```python # Generate with interventions outputs = intervenable.generate( base_inputs, sources=[source_inputs], max_new_tokens=50, do_sample=False, ) ``` ### Saving and Loading ```python # Save locally intervenable.save("./my_intervention") # Load intervenable = pv.IntervenableModel.load("./my_intervention", model=model) # Save to HuggingFace intervenable.save_intervention("username/my-intervention") # Load from HuggingFace intervenable = pv.IntervenableModel.load( "username/my-intervention", model=model ) ``` ### Getting Trainable Parameters ```python # For trainable interventions params = intervenable.get_trainable_parameters() optimizer = torch.optim.Adam(params, lr=1e-4) ``` --- ## IntervenableConfig Configuration container for interventions. ### Basic Config ```python config = pv.IntervenableConfig( representations=[ pv.RepresentationConfig(...) ] ) ``` ### Multiple Interventions ```python config = pv.IntervenableConfig( representations=[ pv.RepresentationConfig(layer=3, component="block_output", ...), pv.RepresentationConfig(layer=5, component="mlp_output", ...), pv.RepresentationConfig(layer=7, component="attention_output", ...), ] ) ``` --- ## RepresentationConfig Specifies a single intervention target. ### Parameters | Parameter | Type | Description | |-----------|------|-------------| | `layer` | int | Layer index | | `component` | str | Component to intervene on | | `intervention_type` | type | Intervention class | | `unit` | str | Intervention unit ("pos", "h", etc.) | | `max_number_of_units` | int | Max units to intervene | | `low_rank_dimension` | int | For trainable interventions | | `subspace_partition` | list | Dimension ranges | ### Components | Component | Description | |-----------|-------------| | `block_input` | Input to transformer block | | `block_output` | Output of transformer block | | `mlp_input` | Input to MLP | | `mlp_output` | Output of MLP | | `mlp_activation` | MLP hidden activations | | `attention_input` | Input to attention | | `attention_output` | Output of attention | | `attention_value_output` | Attention values | | `query_output` | Query vectors | | `key_output` | Key vectors | | `value_output` | Value vectors | | `head_attention_value_output` | Per-head values | ### Example Configs ```python # Position-specific intervention pv.RepresentationConfig( layer=5, component="block_output", intervention_type=pv.VanillaIntervention, unit="pos", max_number_of_units=1, ) # Trainable low-rank intervention pv.RepresentationConfig( layer=5, component="block_output", intervention_type=pv.LowRankRotatedSpaceIntervention, low_rank_dimension=64, ) # Subspace intervention pv.RepresentationConfig( layer=5, component="block_output", intervention_type=pv.VanillaIntervention, subspace_partition=[[0, 256], [256, 512]], # First 512 dim