
Pytorch Lightning
Structure PyTorch model training with LightningModules, LightningDataModules, and multi-GPU Trainer configs without mixing research and engineering concerns.
Overview
PyTorch Lightning is an agent skill for the Build phase that encodes Lightning training organization—modules, datamodules, and Trainer—so solo builders avoid mixing research and distributed-engineering code.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill pytorch-lightningWhat is this skill?
- Separates research logic in LightningModule from Trainer-driven engineering (GPU, DDP, epochs).
- Promotes LightningDataModule with prepare_data vs setup for correct multi-process data loading.
- Discourages manual cuda(), backward(), and optimizer steps inside training_step unless using manual optimization.
- Documents Trainer configuration patterns for accelerators, devices, and distributed strategies.
- Best-practices reference drawn from official Lightning organization guidance.
Adoption & trust: 565 installs on skills.sh; 27.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your PyTorch training script works on one GPU but duplicates device management, dataloaders, and the optimization loop in ways that break under DDP or confuse your agent.
Who is it for?
Solo builders shipping fine-tuning jobs, small ML APIs, or agent-side model training who want DDP-ready structure without reading all of Lightning’s docs first.
Skip if: Teams only doing inference-only deployment with frozen weights and no custom training loop to maintain.
When should I use this skill?
When implementing or refactoring PyTorch training with Lightning, especially before adding GPUs, DDP, or shared dataloading.
What do I get? / Deliverables
You get a Lightning-shaped project layout your agent can extend with clear hooks, datamodule lifecycle, and Trainer-driven scale-out instead of fragile manual loops.
- LightningModule with training_step focused on loss computation
- LightningDataModule with prepare_data and setup splits
- Trainer configuration for accelerator, devices, and strategy
Recommended Skills
Journey fit
Model training and experiment orchestration are core build-phase backend work for ML products and research agents. Lightning patterns live in training code and data pipelines—the backend of an ML feature—not in distribution or ops runbooks.
How it compares
Use as procedural Lightning conventions instead of asking the agent to freestyle raw PyTorch training loops from scratch.
Common Questions / FAQ
Who is pytorch-lightning for?
Indie developers and agent users writing trainable PyTorch models who need repeatable LightningModule and LightningDataModule patterns.
When should I use pytorch-lightning?
During Build when you scaffold training code, add multi-GPU or DDP, or refactor a monolithic script into module plus datamodule plus Trainer.
Is pytorch-lightning safe to install?
Review the Security Audits panel on this Prism page and inspect the skill source in your repo before granting shell or filesystem access to training data.
SKILL.md
READMESKILL.md - Pytorch Lightning
# Best Practices - PyTorch Lightning ## Code Organization ### 1. Separate Research from Engineering **Good:** ```python class MyModel(L.LightningModule): # Research code (what the model does) def training_step(self, batch, batch_idx): loss = self.compute_loss(batch) return loss # Engineering code (how to train) - in Trainer trainer = L.Trainer( max_epochs=100, accelerator="gpu", devices=4, strategy="ddp" ) ``` **Bad:** ```python # Mixing research and engineering logic class MyModel(L.LightningModule): def training_step(self, batch, batch_idx): loss = self.compute_loss(batch) # Don't do device management manually loss = loss.cuda() # Don't do optimizer steps manually (unless manual optimization) self.optimizer.zero_grad() loss.backward() self.optimizer.step() return loss ``` ### 2. Use LightningDataModule **Good:** ```python class MyDataModule(L.LightningDataModule): def __init__(self, data_dir, batch_size): super().__init__() self.data_dir = data_dir self.batch_size = batch_size def prepare_data(self): # Download data once download_data(self.data_dir) def setup(self, stage): # Load data per-process self.train_dataset = MyDataset(self.data_dir, split='train') self.val_dataset = MyDataset(self.data_dir, split='val') def train_dataloader(self): return DataLoader(self.train_dataset, batch_size=self.batch_size, shuffle=True) # Reusable and shareable dm = MyDataModule("./data", batch_size=32) trainer.fit(model, datamodule=dm) ``` **Bad:** ```python # Scattered data logic train_dataset = load_data() val_dataset = load_data() train_loader = DataLoader(train_dataset, ...) val_loader = DataLoader(val_dataset, ...) trainer.fit(model, train_loader, val_loader) ``` ### 3. Keep Models Modular ```python class Encoder(nn.Module): def __init__(self): super().__init__() self.layers = nn.Sequential(...) def forward(self, x): return self.layers(x) class Decoder(nn.Module): def __init__(self): super().__init__() self.layers = nn.Sequential(...) def forward(self, x): return self.layers(x) class MyModel(L.LightningModule): def __init__(self): super().__init__() self.encoder = Encoder() self.decoder = Decoder() def forward(self, x): z = self.encoder(x) return self.decoder(z) ``` ## Device Agnosticism ### 1. Never Use Explicit CUDA Calls **Bad:** ```python x = x.cuda() model = model.cuda() torch.cuda.set_device(0) ``` **Good:** ```python # Inside LightningModule x = x.to(self.device) # Or let Lightning handle it automatically def training_step(self, batch, batch_idx): x, y = batch # Already on correct device return loss ``` ### 2. Use `self.device` Property ```python class MyModel(L.LightningModule): def training_step(self, batch, batch_idx): # Create tensors on correct device noise = torch.randn(batch.size(0), 100).to(self.device) # Or use type_as noise = torch.randn(batch.size(0), 100).type_as(batch) ``` ### 3. Register Buffers for Non-Parameters ```python class MyModel(L.LightningModule): def __init__(self): super().__init__() # Register buffers (automatically moved to correct device) self.register_buffer("running_mean", torch.zeros(100)) def forward(self, x): # self.running_mean is automatically on correct device return x - self.running_mean ``` ## Hyperparameter Management ### 1. Always Use `save_hyperparameters()` **Good:** ```python class MyModel(L.LightningModule): def __init__(self, learning_rate, hidden_dim, dropout): super().__init__() self.save_hyperparameters() # Saves all arguments # Access via self.hparams self.model = nn.Linear(self.hparams.hidden_dim, 10) # Load from