
Implementing Llms Litgpt
Define LitGPT Config dataclasses and extend the GPT Block/attention stack when training or fine-tuning a custom LLM architecture in Python.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill implementing-llms-litgptWhat is this skill?
- Step-by-step workflow: Config dataclass → custom Block/attention → register with LitGPT training
- Maps core classes GPT, Block, CausalSelfAttention, MLP, RMSNorm/LayerNorm in litgpt/model.py
- Model-specific config patterns (LlamaConfig, MistralConfig, PhiConfig) as templates for custom configs
- Supports research architectures, domain adapters, and attention/MLP experiments in one-file style
- Documents extending base GPT versus building entirely new model stacks in LitGPT
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Custom model classes and LitGPT single-file architectures are authored during Build when you implement the model backend that training and inference depend on. Work lives in litgpt/model.py-style backend code—attention, MLP, norms, and config—not UI, DevOps, or distribution.
Common Questions / FAQ
Is Implementing Llms Litgpt safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Implementing Llms Litgpt
# Custom Models Guide to implementing custom model architectures in LitGPT. ## Overview LitGPT's clean, single-file implementations make it easy to create custom architectures. You can extend the base `GPT` class or create entirely new models. **Use cases**: - Implementing new research architectures - Adapting models for specific domains - Experimenting with attention mechanisms - Adding custom layers or components ## Key Files and Classes ### Core Architecture (`litgpt/model.py`) **Main classes**: - `GPT`: Top-level model class - `Block`: Transformer block (attention + MLP) - `CausalSelfAttention`: Attention mechanism - `MLP`: Feed-forward network - `RMSNorm` / `LayerNorm`: Normalization layers **Configuration** (`litgpt/config.py`): - `Config`: Base configuration dataclass - Model-specific configs: `LlamaConfig`, `MistralConfig`, `PhiConfig`, etc. ## Custom Architecture Workflow ### Step 1: Define Configuration Create a `Config` dataclass with your model's hyperparameters: ```python from dataclasses import dataclass from litgpt.config import Config @dataclass class MyModelConfig(Config): """Configuration for my custom model.""" # Standard parameters name: str = "my-model-7b" block_size: int = 4096 vocab_size: int = 32000 n_layer: int = 32 n_head: int = 32 n_embd: int = 4096 # Custom parameters custom_param: float = 0.1 use_custom_attention: bool = True # Optional: override defaults rope_base: int = 10000 intermediate_size: int = 11008 ``` ### Step 2: Implement Custom Components #### Option A: Custom Attention ```python from litgpt.model import CausalSelfAttention import torch import torch.nn as nn class CustomAttention(CausalSelfAttention): """Custom attention mechanism.""" def __init__(self, config): super().__init__(config) # Add custom components self.custom_proj = nn.Linear(config.n_embd, config.n_embd) self.custom_param = config.custom_param def forward(self, x, mask=None, input_pos=None): B, T, C = x.size() # Standard Q, K, V projections q = self.attn(x) k = self.attn(x) v = self.attn(x) # Custom modification q = q + self.custom_proj(x) * self.custom_param # Rest of attention computation q = q.view(B, T, self.n_head, self.head_size) k = k.view(B, T, self.n_query_groups, self.head_size) v = v.view(B, T, self.n_query_groups, self.head_size) # Scaled dot-product attention y = self.scaled_dot_product_attention(q, k, v, mask=mask) y = y.reshape(B, T, C) return self.proj(y) ``` #### Option B: Custom MLP ```python from litgpt.model import MLP class CustomMLP(MLP): """Custom feed-forward network.""" def __init__(self, config): super().__init__(config) # Add custom layers self.custom_layer = nn.Linear(config.intermediate_size, config.intermediate_size) def forward(self, x): x = self.fc_1(x) x = self.act(x) x = self.custom_layer(x) # Custom modification x = self.fc_2(x) return x ``` #### Option C: Custom Block ```python from litgpt.model import Block class CustomBlock(Block): """Custom transformer block.""" def __init__(self, config): super().__init__(config) # Replace attention or MLP self.attn = CustomAttention(config) # Or: self.mlp = CustomMLP(config) # Add custom components self.custom_norm = nn.LayerNorm(config.n_embd) def forward(self, x, input_pos=None, mask=None): # Custom forward pass h = self.norm_1(x) h = self.attn(h, mask=mask, input_pos=input_pos) x = x + h # Custom normalization x = x + self.custom_norm(x) x = x + self.mlp(self.norm_2(x)) return x ``` ### Step 3: Create Custom GPT Model ```python from litgpt.model import GPT import torch.nn as nn class CustomG