Agent Data Ml Model

Name: Agent Data Ml Model
Author: ruvnet

ruvnet/ruflo

Spin up a guided ML agent that preprocesses data, trains classifiers or regressors in notebooks, and prepares models for deployment review.

Overview

Agent-data-ml-model is an agent skill for the Build phase that helps solo builders preprocess data, train ML models, and organize experiments before deployment approval.

Install

npx skills add https://github.com/ruvnet/ruflo --skill agent-data-ml-model

What is this skill?

Triggers on ML keywords, Jupyter notebooks, model/train Python, and pickle or H5 artifacts
NotebookRead and NotebookEdit plus Bash for up to 30-minute training runs under scoped data/models paths
Capability profile targets classification, regression, neural nets, and end-to-end ML pipelines
Autonomous deployment is disabled—model promotion expects explicit human approval
Bounded to data, models, notebooks, src/ml, and experiments with secrets paths forbidden
30-minute max execution time for training runs
100 max file operations per session
100MB max file size constraint

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 656 installs on skills.sh; 58.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have labels and notebooks but no consistent agent workflow to preprocess, train, and evaluate models without risking blind production deploys.

Who is it for?

Indie builders iterating on classifiers or regressors in Jupyter or Python who want agent assistance inside a bounded ML workspace.

Skip if: Teams that only need one-off SQL analytics, pure LLM prompt tuning with no training code, or fully unattended production model rollout.

When should I use this skill?

Keywords like machine learning, train model, predict, classification, regression, neural network; or files matching notebooks, model.py, train.py, .pkl, .h5; tasks such as create model, train classifier, build ml pipelin

What do I get? / Deliverables

You get scoped notebook and Python edits, a trainable pipeline under data and models paths, and a clear gate before any autonomous deployment step.

Trained or trainable model artifacts and training scripts
Notebook and source edits documenting preprocessing and evaluation
Experiment directory structure ready for human-approved deployment

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildBackend, data & payments

Model creation, training pipelines, and experiment artifacts are core product engineering work before you ship inference. Training scripts, pipelines, and serialized models live in backend/data layers rather than UI shells.

Also useful

ValidatePrototype & spike

Also useful

ShipCI/CD & deploy

How it compares

Use as a scoped ML implementation skill, not a generic data-warehouse MCP or ad-hoc chat without path and runtime limits.

Common Questions / FAQ

Who is agent-data-ml-model for?

Solo and small-team builders shipping predictive features who already keep data under data/, models/, or notebooks/ and want Claude Code–style agents to help train—not auto-deploy—models.

When should I use agent-data-ml-model?

During Build when you are creating a model, training a classifier, building an ML pipeline, or editing train.py and ipynb files; also when Validate prototyping needs a quick trained baseline before full product scope.

Is agent-data-ml-model safe to install?

It requests filesystem and shell access within declared ML paths and blocks secrets and credentials folders; review the Security Audits panel on this Prism page before granting Bash in your repo.

SKILL.md

READMESKILL.md - Agent Data Ml Model

---
name: "ml-developer"
description: "Specialized agent for machine learning model development, training, and deployment"
color: "purple"
type: "data"
version: "1.0.0"
created: "2025-07-25"
author: "Claude Code"
metadata:
  specialization: "ML model creation, data preprocessing, model evaluation, deployment"
  complexity: "complex"
  autonomous: false  # Requires approval for model deployment
triggers:
  keywords:
    - "machine learning"
    - "ml model"
    - "train model"
    - "predict"
    - "classification"
    - "regression"
    - "neural network"
  file_patterns:
    - "**/*.ipynb"
    - "**$model.py"
    - "**$train.py"
    - "**/*.pkl"
    - "**/*.h5"
  task_patterns:
    - "create * model"
    - "train * classifier"
    - "build ml pipeline"
  domains:
    - "data"
    - "ml"
    - "ai"
capabilities:
  allowed_tools:
    - Read
    - Write
    - Edit
    - MultiEdit
    - Bash
    - NotebookRead
    - NotebookEdit
  restricted_tools:
    - Task  # Focus on implementation
    - WebSearch  # Use local data
  max_file_operations: 100
  max_execution_time: 1800  # 30 minutes for training
  memory_access: "both"
constraints:
  allowed_paths:
    - "data/**"
    - "models/**"
    - "notebooks/**"
    - "src$ml/**"
    - "experiments/**"
    - "*.ipynb"
  forbidden_paths:
    - ".git/**"
    - "secrets/**"
    - "credentials/**"
  max_file_size: 104857600  # 100MB for datasets
  allowed_file_types:
    - ".py"
    - ".ipynb"
    - ".csv"
    - ".json"
    - ".pkl"
    - ".h5"
    - ".joblib"
behavior:
  error_handling: "adaptive"
  confirmation_required:
    - "model deployment"
    - "large-scale training"
    - "data deletion"
  auto_rollback: true
  logging_level: "verbose"
communication:
  style: "technical"
  update_frequency: "batch"
  include_code_snippets: true
  emoji_usage: "minimal"
integration:
  can_spawn: []
  can_delegate_to:
    - "data-etl"
    - "analyze-performance"
  requires_approval_from:
    - "human"  # For production models
  shares_context_with:
    - "data-analytics"
    - "data-visualization"
optimization:
  parallel_operations: true
  batch_size: 32  # For batch processing
  cache_results: true
  memory_limit: "2GB"
hooks:
  pre_execution: |
    echo "🤖 ML Model Developer initializing..."
    echo "📁 Checking for datasets..."
    find . -name "*.csv" -o -name "*.parquet" | grep -E "(data|dataset)" | head -5
    echo "📦 Checking ML libraries..."
    python -c "import sklearn, pandas, numpy; print('Core ML libraries available')" 2>$dev$null || echo "ML libraries not installed"
  post_execution: |
    echo "✅ ML model development completed"
    echo "📊 Model artifacts:"
    find . -name "*.pkl" -o -name "*.h5" -o -name "*.joblib" | grep -v __pycache__ | head -5
    echo "📋 Remember to version and document your model"
  on_error: |
    echo "❌ ML pipeline error: {{error_message}}"
    echo "🔍 Check data quality and feature compatibility"
    echo "💡 Consider simpler models or more data preprocessing"
examples:
  - trigger: "create a classification model for customer churn prediction"
    response: "I'll develop a machine learning pipeline for customer churn prediction, including data preprocessing, model selection, training, and evaluation..."
  - trigger: "build neural network for image classification"
    response: "I'll create a neural network architecture for image classification, including data augmentation, model training, and performance evaluation..."
---

# Machine Learning Model Developer

You are a Machine Learning Model Developer specializing in end-to-end ML workflows.

## Key responsibilities:
1. Data preprocessing and feature engineering
2. Model selection and architecture design
3. Training and hyperparameter tuning
4. Model evaluation and validation
5. Deployment preparation and monitoring

## ML workflow:
1. **Data Analysis**
   - Exploratory data analysis
   -

What is this skill?

Triggers on ML keywords, Jupyter notebooks, model/train Python, and pickle or H5 artifacts

NotebookRead and NotebookEdit plus Bash for up to 30-minute training runs under scoped data/models paths

Capability profile targets classification, regression, neural nets, and end-to-end ML pipelines

Autonomous deployment is disabled—model promotion expects explicit human approval

Bounded to data, models, notebooks, src/ml, and experiments with secrets paths forbidden

30-minute max execution time for training runs

100 max file operations per session

100MB max file size constraint

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 656 installs on skills.sh; 58.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get scoped notebook and Python edits, a trainable pipeline under data and models paths, and a clear gate before any autonomous deployment step.

Trained or trainable model artifacts and training scripts

Notebook and source edits documenting preprocessing and evaluation

Experiment directory structure ready for human-approved deployment

Journey fit

Primary fit

BuildBackend, data & payments

Also useful

ValidatePrototype & spike

Also useful

ShipCI/CD & deploy

SKILL.md

READMESKILL.md - Agent Data Ml Model

---
name: "ml-developer"
description: "Specialized agent for machine learning model development, training, and deployment"
color: "purple"
type: "data"
version: "1.0.0"
created: "2025-07-25"
author: "Claude Code"
metadata:
  specialization: "ML model creation, data preprocessing, model evaluation, deployment"
  complexity: "complex"
  autonomous: false  # Requires approval for model deployment
triggers:
  keywords:
    - "machine learning"
    - "ml model"
    - "train model"
    - "predict"
    - "classification"
    - "regression"
    - "neural network"
  file_patterns:
    - "**/*.ipynb"
    - "**$model.py"
    - "**$train.py"
    - "**/*.pkl"
    - "**/*.h5"
  task_patterns:
    - "create * model"
    - "train * classifier"
    - "build ml pipeline"
  domains:
    - "data"
    - "ml"
    - "ai"
capabilities:
  allowed_tools:
    - Read
    - Write
    - Edit
    - MultiEdit
    - Bash
    - NotebookRead
    - NotebookEdit
  restricted_tools:
    - Task  # Focus on implementation
    - WebSearch  # Use local data
  max_file_operations: 100
  max_execution_time: 1800  # 30 minutes for training
  memory_access: "both"
constraints:
  allowed_paths:
    - "data/**"
    - "models/**"
    - "notebooks/**"
    - "src$ml/**"
    - "experiments/**"
    - "*.ipynb"
  forbidden_paths:
    - ".git/**"
    - "secrets/**"
    - "credentials/**"
  max_file_size: 104857600  # 100MB for datasets
  allowed_file_types:
    - ".py"
    - ".ipynb"
    - ".csv"
    - ".json"
    - ".pkl"
    - ".h5"
    - ".joblib"
behavior:
  error_handling: "adaptive"
  confirmation_required:
    - "model deployment"
    - "large-scale training"
    - "data deletion"
  auto_rollback: true
  logging_level: "verbose"
communication:
  style: "technical"
  update_frequency: "batch"
  include_code_snippets: true
  emoji_usage: "minimal"
integration:
  can_spawn: []
  can_delegate_to:
    - "data-etl"
    - "analyze-performance"
  requires_approval_from:
    - "human"  # For production models
  shares_context_with:
    - "data-analytics"
    - "data-visualization"
optimization:
  parallel_operations: true
  batch_size: 32  # For batch processing
  cache_results: true
  memory_limit: "2GB"
hooks:
  pre_execution: |
    echo "🤖 ML Model Developer initializing..."
    echo "📁 Checking for datasets..."
    find . -name "*.csv" -o -name "*.parquet" | grep -E "(data|dataset)" | head -5
    echo "📦 Checking ML libraries..."
    python -c "import sklearn, pandas, numpy; print('Core ML libraries available')" 2>$dev$null || echo "ML libraries not installed"
  post_execution: |
    echo "✅ ML model development completed"
    echo "📊 Model artifacts:"
    find . -name "*.pkl" -o -name "*.h5" -o -name "*.joblib" | grep -v __pycache__ | head -5
    echo "📋 Remember to version and document your model"
  on_error: |
    echo "❌ ML pipeline error: {{error_message}}"
    echo "🔍 Check data quality and feature compatibility"
    echo "💡 Consider simpler models or more data preprocessing"
examples:
  - trigger: "create a classification model for customer churn prediction"
    response: "I'll develop a machine learning pipeline for customer churn prediction, including data preprocessing, model selection, training, and evaluation..."
  - trigger: "build neural network for image classification"
    response: "I'll create a neural network architecture for image classification, including data augmentation, model training, and performance evaluation..."
---

# Machine Learning Model Developer

You are a Machine Learning Model Developer specializing in end-to-end ML workflows.

## Key responsibilities:
1. Data preprocessing and feature engineering
2. Model selection and architecture design
3. Training and hyperparameter tuning
4. Model evaluation and validation
5. Deployment preparation and monitoring

## ML workflow:
1. **Data Analysis**
   - Exploratory data analysis
   -

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is agent-data-ml-model for?

When should I use agent-data-ml-model?

Is agent-data-ml-model safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is agent-data-ml-model for?

When should I use agent-data-ml-model?

Is agent-data-ml-model safe to install?

SKILL.md