
Nnsight Remote Interpretability
Run nnsight LanguageModel traces and remote interpretability experiments on transformer hidden states without rewriting HuggingFace boilerplate.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill nnsight-remote-interpretabilityWhat is this skill?
- LanguageModel wrapper for GPT-2, Llama 3.1 8B, and custom dtype/device_map loading
- trace() context records deferred ops; .save() materializes hidden states and logits after execution
- Remote execution via remote=True on the same trace API (NDIF-backed pattern in docs)
- Tracing flags: validate tensor shapes, scan for shape info
- Documents model._model, tokenizer, and config access patterns
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Interpretability and intervention APIs are builder tooling you wire while implementing or debugging agent/ML features. agent-tooling covers library reference skills that teach agents how to instrument models during development.
Common Questions / FAQ
Is Nnsight Remote Interpretability safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Nnsight Remote Interpretability
# nnsight API Reference ## LanguageModel Main class for wrapping language models with intervention capabilities. ### Loading Models ```python from nnsight import LanguageModel # Basic loading model = LanguageModel("openai-community/gpt2", device_map="auto") # Larger models model = LanguageModel("meta-llama/Llama-3.1-8B", device_map="auto") # With custom tokenizer settings model = LanguageModel( "gpt2", device_map="auto", torch_dtype=torch.float16, ) ``` ### Model Attributes ```python # Access underlying HuggingFace model model._model # Access tokenizer model.tokenizer # Model config model._model.config ``` --- ## Tracing Context The `trace()` method creates a context for deferred execution. ### Basic Tracing ```python with model.trace("Hello world") as tracer: # Operations are recorded, not executed immediately hidden = model.transformer.h[5].output[0].save() logits = model.output.save() # After context, operations execute and saved values are available print(hidden.shape) ``` ### Tracing Parameters ```python with model.trace( prompt, # Input text or tokens remote=False, # Use NDIF remote execution validate=True, # Validate tensor shapes scan=True, # Scan for shape info ) as tracer: ... ``` ### Remote Execution ```python # Same code works remotely with model.trace("Hello", remote=True) as tracer: hidden = model.transformer.h[5].output[0].save() ``` --- ## Proxy Objects Inside tracing context, accessing modules returns Proxy objects. ### Accessing Values ```python with model.trace("Hello") as tracer: # These are Proxy objects layer_output = model.transformer.h[5].output[0] attention = model.transformer.h[5].attn.output # Operations create new Proxies mean = layer_output.mean(dim=-1) normed = layer_output / layer_output.norm() ``` ### Saving Values ```python with model.trace("Hello") as tracer: # Must call .save() to access after context hidden = model.transformer.h[5].output[0].save() # Now hidden contains actual tensor print(hidden.shape) ``` ### Modifying Values ```python with model.trace("Hello") as tracer: # In-place modification model.transformer.h[5].output[0][:] = 0 # Replace with computed value model.transformer.h[5].output[0][:] = some_tensor # Arithmetic modification model.transformer.h[5].output[0][:] *= 0.5 model.transformer.h[5].output[0][:] += steering_vector ``` ### Proxy Operations ```python with model.trace("Hello") as tracer: h = model.transformer.h[5].output[0] # Indexing first_token = h[:, 0, :] last_token = h[:, -1, :] # PyTorch operations mean = h.mean(dim=-1) norm = h.norm() transposed = h.transpose(1, 2) # Save results mean.save() ``` --- ## Module Access Patterns ### GPT-2 Structure ```python with model.trace("Hello") as tracer: # Embeddings embed = model.transformer.wte.output.save() pos_embed = model.transformer.wpe.output.save() # Layer outputs layer_out = model.transformer.h[5].output[0].save() # Attention attn_out = model.transformer.h[5].attn.output.save() # MLP mlp_out = model.transformer.h[5].mlp.output.save() # Final output logits = model.output.save() ``` ### LLaMA Structure ```python with model.trace("Hello") as tracer: # Embeddings embed = model.model.embed_tokens.output.save() # Layer outputs layer_out = model.model.layers[10].output[0].save() # Attention attn_out = model.model.layers[10].self_attn.output.save() # MLP mlp_out = model.model.layers[10].mlp.output.save() # Final output logits = model.output.save() ``` ### Finding Module Names ```python # Print model structure print(model._model) # Or iterate for name, module in model._model.named_modules(): print(name) ``` --- ## Multiple Prompts (invoke) Process multiple prompts in a single tra