
Ml Pipeline
Set up reproducible ML experiment tracking with MLflow or Weights & Biases so solo builders can compare runs, version models, and ship models they can trust.
Overview
ml-pipeline is an agent skill for the Build phase that implements ML experiment tracking with MLflow, W&B-style integration, and model registry patterns for reproducible training runs.
Install
npx skills add https://github.com/jeffallan/claude-skills --skill ml-pipelineWhat is this skill?
- MLflow tracker wrapper: experiments, runs, hyperparameters, metrics, and artifact paths
- Guidance for Weights & Biases integration and custom tracking when hosted MLflow is not enough
- Model registry and versioning patterns for reproducible comparison and promotion
- Explicit when-not-to-use guardrails for one-off scripts and non-ML work
- Python-first examples using MlflowClient and configurable tracking URIs
Adoption & trust: 2.2k installs on skills.sh; 9.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are iterating on models but cannot reproduce runs, compare metrics fairly, or promote a specific checkpoint to production.
Who is it for?
Solo builders adding ML to a SaaS or internal tool who need MLflow or W&B-style discipline before scaling hyperparameter search.
Skip if: Quick scripts with no hyperparameters, non-ML apps, or teams that only need a single untracked notebook run.
When should I use this skill?
Setting up MLflow, W&B integration, model registries, comparing experiments, or building custom ML tracking—not for quick one-off experiments without reproducibility needs.
What do I get? / Deliverables
You get a structured tracking setup—experiments, logged params/metrics/artifacts, and registry/version habits—so the next skill or deploy step can target a named model version.
- MLflowTracker or equivalent tracking module
- Experiment and registry naming conventions
- Logged params, metrics, and artifact layout per run
Recommended Skills
Journey fit
ML pipelines and experiment tracking sit in Build because they implement training infrastructure, registries, and artifact capture—not one-off research notes in Idea. Backend subphase fits data/ML services, tracking servers, and pipeline code that agents generate alongside training and deployment prep.
How it compares
Use for experiment-tracking architecture and registry workflows, not generic Python debugging or front-end UI work.
Common Questions / FAQ
Who is ml-pipeline for?
Indie developers and small teams using coding agents to add training pipelines, compare experiments, and version models without hiring an MLOps engineer first.
When should I use ml-pipeline?
During Build when wiring MLflow, integrating Weights & Biases, designing a model registry, or documenting how runs and artifacts are stored before Ship and Operate monitoring.
Is ml-pipeline safe to install?
Treat it as procedural guidance that may suggest network calls to tracking servers and filesystem artifacts; review the Security Audits panel on this page and lock down tracking URIs and secrets in your own environment.
SKILL.md
READMESKILL.md - Ml Pipeline
# Experiment Tracking --- ## Overview Experiment tracking enables reproducibility, comparison, and collaboration in ML development. It captures hyperparameters, metrics, artifacts, and model versions to ensure every experiment can be reproduced and compared. ## When to Use This Reference - Setting up MLflow for experiment tracking - Implementing Weights & Biases integration - Creating model registries and versioning - Comparing experiments and selecting models - Building custom tracking solutions ## When NOT to Use - Quick one-off experiments without reproducibility needs - Simple scripts without hyperparameters - Non-ML projects --- ## MLflow Integration ### Basic Experiment Tracking ```python import mlflow from mlflow.tracking import MlflowClient from pathlib import Path import json class MLflowTracker: """MLflow experiment tracking wrapper.""" def __init__( self, experiment_name: str, tracking_uri: str = "http://localhost:5000", artifact_location: str = None, ): mlflow.set_tracking_uri(tracking_uri) # Create or get experiment experiment = mlflow.get_experiment_by_name(experiment_name) if experiment is None: self.experiment_id = mlflow.create_experiment( experiment_name, artifact_location=artifact_location, ) else: self.experiment_id = experiment.experiment_id mlflow.set_experiment(experiment_name) self.client = MlflowClient() self.run = None def start_run( self, run_name: str = None, tags: dict = None, nested: bool = False, ) -> str: """Start a new MLflow run.""" self.run = mlflow.start_run( run_name=run_name, experiment_id=self.experiment_id, nested=nested, ) if tags: mlflow.set_tags(tags) return self.run.info.run_id def end_run(self, status: str = "FINISHED") -> None: """End the current run.""" mlflow.end_run(status=status) self.run = None def log_params(self, params: dict) -> None: """Log hyperparameters.""" mlflow.log_params(params) def log_metrics(self, metrics: dict, step: int = None) -> None: """Log metrics with optional step.""" for key, value in metrics.items(): mlflow.log_metric(key, value, step=step) def log_artifact(self, local_path: str, artifact_path: str = None) -> None: """Log file or directory as artifact.""" mlflow.log_artifact(local_path, artifact_path) def log_model( self, model, artifact_path: str, registered_model_name: str = None, signature=None, input_example=None, ) -> str: """Log model with optional registration.""" from mlflow.models import infer_signature if signature is None and input_example is not None: signature = infer_signature(input_example, model.predict(input_example)) model_info = mlflow.sklearn.log_model( model, artifact_path=artifact_path, registered_model_name=registered_model_name, signature=signature, input_example=input_example, ) return model_info.model_uri # Usage example def train_with_mlflow( model, X_train, y_train, X_val, y_val, params: dict, ): """Complete training run with MLflow tracking.""" tracker = MLflowTracker("my_experiment") tracker.start_run( run_name=f"run_{params['model_type']}", tags={ "model_type": params["model_type"], "dataset_version": "v1.0", "author": "ml-team", }, ) try: # Log parameters tracker.log_params(params) # Train model model.fit(X_train, y_train) # Evaluate and log metrics train_score = model.scor