Pytorch Patterns

Canonical shelf is Build because the skill centers on authoring models, training loops, and data pipelines—the core ML implementation work. Backend subphase fits training scripts, model code, and data loaders rather than UI or agent packaging.

Also useful

Also useful

Where it fits

Example use

Scaffold a new classifier and training loop with `.to(device)` and seed helpers before your first epoch.

Example use

Audit a contributor’s PR for hardcoded CUDA calls and missing reproducibility blocks.

Example use

ValidatePrototype & spike

Trace OOM or slowdowns by applying shape checks and memory-conscious batching patterns.

Example use

Stand up a quick fine-tune prototype with fixed seeds so demo metrics match yesterday’s run.

How it compares

Use as procedural PyTorch conventions in-agent instead of copying scattered Stack Overflow snippets into every new script.

Common Questions / FAQ

Who is pytorch-patterns for?

Solo and indie developers shipping PyTorch models or training jobs who want reproducible, GPU-safe patterns without a dedicated ML platform team.

When should I use pytorch-patterns?

Use it in Build when authoring models and dataloaders; in Ship when reviewing deep learning PRs; and in Operate when debugging slow training, OOM errors, or non-reproducible metrics on your machine.

Is pytorch-patterns safe to install?

It is documentation-style procedural guidance with no mandated shell or network calls; review the Security Audits panel on this Prism page before trusting any third-party skill package in your agent.

SKILL.md

READMESKILL.md - Pytorch Patterns

# PyTorch Development Patterns

Idiomatic PyTorch patterns and best practices for building robust, efficient, and reproducible deep learning applications.

## When to Activate

- Writing new PyTorch models or training scripts
- Reviewing deep learning code
- Debugging training loops or data pipelines
- Optimizing GPU memory usage or training speed
- Setting up reproducible experiments

## Core Principles

### 1. Device-Agnostic Code

Always write code that works on both CPU and GPU without hardcoding devices.

```python
# Good: Device-agnostic
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
data = data.to(device)

# Bad: Hardcoded device
model = MyModel().cuda()  # Crashes if no GPU
data = data.cuda()
```

### 2. Reproducibility First

Set all random seeds for reproducible results.

```python
# Good: Full reproducibility setup
def set_seed(seed: int = 42) -> None:
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Bad: No seed control
model = MyModel()  # Different weights every run
```

### 3. Explicit Shape Management

Always document and verify tensor shapes.

```python
# Good: Shape-annotated forward pass
def forward(self, x: torch.Tensor) -> torch.Tensor:
    # x: (batch_size, channels, height, width)
    x = self.conv1(x)    # -> (batch_size, 32, H, W)
    x = self.pool(x)     # -> (batch_size, 32, H//2, W//2)
    x = x.view(x.size(0), -1)  # -> (batch_size, 32*H//2*W//2)
    return self.fc(x)    # -> (batch_size, num_classes)

# Bad: No shape tracking
def forward(self, x):
    x = self.conv1(x)
    x = self.pool(x)
    x = x.view(x.size(0), -1)  # What size is this?
    return self.fc(x)           # Will this even work?
```

## Model Architecture Patterns

### Clean nn.Module Structure

```python
# Good: Well-organized module
class ImageClassifier(nn.Module):
    def __init__(self, num_classes: int, dropout: float = 0.5) -> None:
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(64 * 16 * 16, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

# Bad: Everything in forward
class ImageClassifier(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        x = F.conv2d(x, weight=self.make_weight())  # Creates weight each call!
        return x
```

### Proper Weight Initialization

```python
# Good: Explicit initialization
def _init_weights(self, module: nn.Module) -> None:
    if isinstance(module, nn.Linear):
        nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
        if module.bias is not None:
            nn.init.zeros_(module.bias)
    elif isinstance(module, nn.Conv2d):
        nn.init.kaiming_normal_(module.weight, mode="fan_out", nonlinearity="relu")
    elif isinstance(module, nn.BatchNorm2d):
        nn.init.ones_(module.weight)
        nn.init.zeros_(module.bias)

model = MyModel()
model.apply(model._init_weights)
```

## Training Loop Patterns

### Standard Training Loop

```python
# Good: Complete training loop with best practices
def train_one_epoch(
    model: nn.Module,
    dataloader: DataLoader,
    optimizer: torch.optim.Optimizer,
    criterion: nn.Module,
    device: torch.device,
    scaler: torch.amp.GradScaler | None = None,
)

What is this skill?

Device-agnostic CPU/GPU patterns without hardcoded `.cuda()`

Full reproducibility setup: seeds, cuDNN deterministic flags

Explicit tensor shape documentation and verification habits

Guidance for reviewing DL code and debugging training loops

Optimization focus for GPU memory and training speed

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 4.1k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Who is it for?

Indie builders adding a small ML feature, fine-tuning a model, or maintaining a single-repo training script before handing it to CI or a hosted trainer.

Skip if: Teams that only need high-level AutoML or no-code training with zero custom `torch.nn` code, or freight/logistics workflows unrelated to deep learning.

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Scaffold a new classifier and training loop with `.to(device)` and seed helpers before your first epoch.

Example use

Audit a contributor’s PR for hardcoded CUDA calls and missing reproducibility blocks.

Example use