
Mle Workflow
Turn notebook experiments into production ML with data contracts, gated training, offline evals, deployable artifacts, monitoring, and rollback paths your agent can follow end to end.
Overview
mle-workflow is an agent skill most often used in Build (also Ship, Operate) that structures production ML from data contracts through deploy, monitoring, and rollback.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill mle-workflowWhat is this skill?
- Scope calibration across ranking, recommenders, classifiers, forecasting, embeddings, and LLM workflows without forcing
- Lanes for data contracts, reproducible training, measurable quality gates, and artifact promotion
- Deploy paths with canary, shadow traffic, and post-deploy quality checks
- Operational focus on drift, label leakage, stale features, and train/serve skew debugging
Adoption & trust: 1.1k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your model work lives in notebooks with no contracts, eval gates, or operable path when data drifts or serving diverges from training.
Who is it for?
Indie builders shipping ML-powered SaaS features who need a disciplined workflow beyond one-off Jupyter cells.
Skip if: Pure prompt-only chatbots with no measurable ML artifact, or teams wanting a single fixed stack mandate for every model type.
When should I use this skill?
Use when building, reviewing, or hardening ML systems beyond one-off notebooks—including data contracts, eval gates, deployment, and monitoring.
What do I get? / Deliverables
You get a calibrated ML engineering plan with reproducible training, promotion criteria, deployment options, and monitoring hooks aligned to your system’s actual labels and serving mode.
- Data and training contract outline
- Evaluation and promotion criteria
- Deploy and monitoring/rollback plan
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build is the canonical shelf because the skill’s core job is hardening pipelines, features, and training/serving parity—not first ideation or pure marketing launch. Backend fits training jobs, batch inference, feature pipelines, and serving contracts that live outside the UI layer.
Where it fits
Decide which ML lanes (labels, online serving, feature store) actually apply before committing build scope for a recommender.
Define data contracts and a reproducible training job when moving a classifier out of a notebook.
Set offline metrics and shadow-traffic checks before promoting a refreshed ranking model.
Add drift detection and rollback triggers when batch feature freshness degrades in production.
How it compares
A production ML systems workflow skill—not a notebook tutorial generator or a generic unit-test-only QA checklist.
Common Questions / FAQ
Who is mle-workflow for?
Solo builders and small teams using agentic coding tools to design, review, or harden ML features from data through operations.
When should I use mle-workflow?
In Build when designing pipelines and train/serve logic; in Ship when defining offline/online evals and promotion gates; in Operate when adding monitoring, canaries, or rollback after drift or quality regressions.
Is mle-workflow safe to install?
The skill may steer agents toward shell, data, and deployment actions—review the Security Audits panel on this page and scope permissions to your environment.
SKILL.md
READMESKILL.md - Mle Workflow
# Machine Learning Engineering Workflow Use this skill to turn model work into a production ML system with clear data contracts, repeatable training, measurable quality gates, deployable artifacts, and operational monitoring. ## When to Activate - Planning or reviewing a production ML feature, model refresh, ranking system, recommender, classifier, embedding workflow, or forecasting pipeline - Converting notebook code into a reusable training, evaluation, batch inference, or online inference pipeline - Designing model promotion criteria, offline/online evals, experiment tracking, or rollback paths - Debugging failures caused by data drift, label leakage, stale features, artifact mismatch, or inconsistent training and serving logic - Adding model monitoring, canary rollout, shadow traffic, or post-deploy quality checks ## Scope Calibration Use only the lanes that fit the system in front of you. This skill is useful for ranking, search, recommendations, classifiers, forecasting, embeddings, LLM workflows, anomaly detection, and batch analytics, but it should not force one architecture onto all of them. - Do not assume every model has supervised labels, online serving, a feature store, PyTorch, GPUs, human review, A/B tests, or real-time feedback. - Do not add heavyweight MLOps machinery when a data contract, baseline, eval script, and rollback note would make the change reviewable. - Do make assumptions explicit when the project lacks labels, delayed outcomes, slice definitions, production traffic, or monitoring ownership. - Treat examples as interchangeable scaffolds. Replace metrics, serving mode, data stores, and rollout mechanics with the project-native equivalents. ## Related Skills - `python-patterns` and `python-testing` for Python implementation and pytest coverage - `pytorch-patterns` for deep learning models, data loaders, device handling, and training loops - `eval-harness` and `ai-regression-testing` for promotion gates and agent-assisted regression checks - `database-migrations`, `postgres-patterns`, and `clickhouse-io` for data storage and analytics surfaces - `deployment-patterns`, `docker-patterns`, and `security-review` for serving, secrets, containers, and production hardening ## Reuse the SWE Surface Do not treat MLE as separate from software engineering. Most ECC SWE workflows apply directly to ML systems, often with stricter failure modes: The recommended `minimal --with capability:machine-learning` install keeps the core agent surface available alongside this skill. For skill-only or agent-limited harnesses, pair `skill:mle-workflow` with `agent:mle-reviewer` where the target supports agents. | SWE surface | MLE use | |-------------|---------| | `product-capability` / `architecture-decision-records` | Turn model work into explicit product contracts and record irreversible data, model, and rollout choices | | `repo-scan` / `codebase-onboarding` / `code-tour` | Find existing training, feature, serving, eval, and monitoring paths before introducing a parallel ML stack | | `plan` / `feature-dev` | Scope model changes as product capabilities with data, eval, serving, and rollback phases | | `tdd-workflow` / `python-testing` | Test feature transforms, split logic, metric calculations, artifact loading, and inference schemas before implementation | | `code-reviewer` / `mle-reviewer` | Review code quality plus ML-specific leakage, reproducibility, promotion, and monitoring risks | | `build-fix` / `pr-test-analyzer` | Diagnose broken CI, flaky evals, missing fixtures, and environment-specific model or dependency failures | | `quality-gate` / `test-coverage` | Require automated evidence for transforms, metrics, inference contract