Machine Learning

Most ML pipeline work lands in build when you are implementing models and training code, even though evaluation habits matter in ship. Training, evaluation, and pipeline code are backend/data-layer concerns rather than UI or distribution.

Also useful

Also useful

Where it fits

Example use

Prototype a churn classifier with held-out test metrics before committing engineering time to a full feature.

Example use

Implement training and inference modules with PyTorch or sklearn tied to your API.

Example use

Run stratified cross-validation and learning curves as evidence before shipping a model-backed endpoint.

How it compares

Use for disciplined training pipelines instead of ad-hoc notebook cells without splits, tracking, or interpretability gates.

Common Questions / FAQ

Who is machine-learning for?

Solo developers and small teams building predictive or classification features who use PyTorch or scikit-learn and need reproducible, explainable training workflows.

When should I use machine-learning?

In validate when scoping feasibility with proper evaluation design; in build when implementing training code; in ship when you need cross-validation and learning-curve evidence before launch.

Is machine-learning safe to install?

Check the Security Audits panel on this Prism page; ML skills often need shell and network for package installs and should not run untrusted training data without review.

SKILL.md

READMESKILL.md - Machine Learning

# Machine Learning

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Machine Learning provides end-to-end ML pipeline construction with PyTorch and scikit-learn, covering model selection, training, evaluation, interpretability, hyperparameter tuning, and experiment tracking. The agent builds reproducible ML workflows that follow software engineering best practices: version-controlled experiments, deterministic training, and interpretable results.

The gap between a working notebook and a production ML pipeline is enormous. This skill bridges that gap by enforcing structured experiment management, proper train/validation/test splits, stratified cross-validation, learning curve analysis, and systematic hyperparameter optimization. The agent tracks every experiment with its configuration, metrics, and artifacts, making it possible to reproduce any result months later.

Model interpretability is treated as a first-class requirement, not an optional post-hoc analysis. Every model comes with SHAP values, feature importance rankings, and partial dependence plots that explain what the model learned and why it makes specific predictions. Black-box predictions without explanations are insufficient for scientific and business-critical applications.

## Use When

- Building classification or regression models
- Tuning hyperparameters systematically
- Explaining model predictions with SHAP or feature importance
- Setting up experiment tracking for ML projects
- Evaluating model performance with proper cross-validation
- Training PyTorch models with structured training loops

## How It Works

```mermaid
graph TD
    A[Dataset] --> B[Train/Val/Test Split]
    B --> C[Feature Engineering]
    C --> D[Model Selection]
    D --> E[Hyperparameter Tuning: Optuna]
    E --> F[Cross-Validation]
    F --> G[Best Model Training]
    G --> H[Evaluation on Test Set]
    H --> I[Interpretability: SHAP]
    I --> J[Experiment Logging]
    J --> K[Model Registry]
```

The pipeline enforces a strict separation between tuning (using validation data) and final evaluation (using held-out test data). The test set is touched exactly once, preventing information leakage from repeated evaluation.

## Implementation

```python
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report, roc_auc_score
import optuna
import shap
import numpy as np

class Classifier(nn.Module):
    def __init__(self, input_dim: int, hidden_dim: int, dropout: float):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim // 2, 1),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)

def train_epoch(model, loader, optimizer, criterion, device):
    model.train()
    total_loss = 0
    for X_batch, y_batch in loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        optimizer.zero_grad()
        pred = model(X_batch).squeeze()
        loss = criterion(pred, y_batch.float())
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * len(X_batch)
    return total_loss / len(loader.dataset)

def hyperparameter_search(X: np.ndarray, y: np.ndarray, n_trials: int = 50) -> dict:
    def objective(trial):
        hidden = trial.suggest_int("hidden_dim", 32, 256)
        lr = trial.su

What is this skill?

End-to-end ML pipelines with PyTorch and scikit-learn

Train/validation/test splits, stratified cross-validation, and learning curves

Hyperparameter optimization with experiment configuration tracking

SHAP values, feature importance, and partial dependence for interpretability

Reproducible workflows: versioned experiments, deterministic training, stored artifacts

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

You get a documented ML workflow with proper validation, hyperparameter search, tracked metrics and artifacts, and interpretability reports suitable for iteration or release review.

Training and evaluation pipeline with documented splits

Experiment log with configs, metrics, and artifacts

Interpretability outputs (SHAP, importance, partial dependence)

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Prototype a churn classifier with held-out test metrics before committing engineering time to a full feature.

Example use

Implement training and inference modules with PyTorch or sklearn tied to your API.

Example use