
Ml Pipeline Workflow
Design and automate end-to-end MLOps pipelines from data prep through training, validation, deployment, and monitoring.
Install
npx skills add https://github.com/wshobson/agents --skill ml-pipeline-workflowWhat is this skill?
- Full lifecycle: ingestion → preparation → training → validation → deployment → monitoring
- DAG orchestration patterns for Airflow, Dagster, and Kubeflow
- Data validation, feature engineering, versioning, and train/val/test splits
- Training orchestration with hyperparameters, experiment tracking, and distributed training hooks
- Error handling, retries, and component dependency design for production ML
Adoption & trust: 7k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Build is the primary shelf because the skill centers on creating DAG-based ML orchestration and training automation before production hardening. Backend subphase matches pipeline services, training jobs, and integration with orchestrators rather than mobile UI or marketing.
Common Questions / FAQ
Is Ml Pipeline Workflow safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Ml Pipeline Workflow
# ML Pipeline Workflow Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment. ## Overview This skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring. ## When to Use This Skill - Building new ML pipelines from scratch - Designing workflow orchestration for ML systems - Implementing data → model → deployment automation - Setting up reproducible training workflows - Creating DAG-based ML orchestration - Integrating ML components into production systems ## What This Skill Provides ### Core Capabilities 1. **Pipeline Architecture** - End-to-end workflow design - DAG orchestration patterns (Airflow, Dagster, Kubeflow) - Component dependencies and data flow - Error handling and retry strategies 2. **Data Preparation** - Data validation and quality checks - Feature engineering pipelines - Data versioning and lineage - Train/validation/test splitting strategies 3. **Model Training** - Training job orchestration - Hyperparameter management - Experiment tracking integration - Distributed training patterns 4. **Model Validation** - Validation frameworks and metrics - A/B testing infrastructure - Performance regression detection - Model comparison workflows 5. **Deployment Automation** - Model serving patterns - Canary deployments - Blue-green deployment strategies - Rollback mechanisms ### Reference Documentation See the `references/` directory for detailed guides: - **data-preparation.md** - Data cleaning, validation, and feature engineering - **model-training.md** - Training workflows and best practices - **model-validation.md** - Validation strategies and metrics - **model-deployment.md** - Deployment patterns and serving architectures ### Assets and Templates The `assets/` directory contains: - **pipeline-dag.yaml.template** - DAG template for workflow orchestration - **training-config.yaml** - Training configuration template - **validation-checklist.md** - Pre-deployment validation checklist ## Usage Patterns ### Basic Pipeline Setup ```python # 1. Define pipeline stages stages = [ "data_ingestion", "data_validation", "feature_engineering", "model_training", "model_validation", "model_deployment" ] # 2. Configure dependencies # See assets/pipeline-dag.yaml.template for full example ``` ### Production Workflow 1. **Data Preparation Phase** - Ingest raw data from sources - Run data quality checks - Apply feature transformations - Version processed datasets 2. **Training Phase** - Load versioned training data - Execute training jobs - Track experiments and metrics - Save trained models 3. **Validation Phase** - Run validation test suite - Compare against baseline - Generate performance reports - Approve for deployment 4. **Deployment Phase** - Package model artifacts - Deploy to serving infrastructure - Configure monitoring - Validate production traffic ## Best Practices ### Pipeline Design - **Modularity**: Each stage should be independently testable - **Idempotency**: Re-running stages should be safe - **Observability**: Log metrics at every stage - **Versioning**: Track data, code, and model versions - **Failure Handling**: Implement retry logic and alerting ### Data Management - Use data validation libraries (Great Expectations, TFX) - Version datasets with DVC or similar tools - Document feature engineering transformations - Maintain data lineage tracking ### Model Operations - Separate training and serving infrastructu