Senior Ml Engineer

Name: Senior Ml Engineer
Author: davila7

davila7/claude-code-templates

799 installs
29.9k repo stars
Updated July 27, 2026
davila7/claude-code-templates

senior-ml-engineer is a Claude agent skill that provides production-first LLM integration patterns, MLOps guidance, and senior ML engineer standards for developers building scalable, observable AI features and agents.

About

senior-ml-engineer is an LLM integration guide skill from davila7/claude-code-templates that encodes senior ML/AI engineer practices for production systems. The skill emphasizes designing for 10x scalability headroom, 99.9% uptime reliability, full observability, input validation, encryption, access control, and audit logging from day one. Advanced patterns cover distributed processing, strategic caching, batch processing, and enterprise-scale fault tolerance for AI backends. Developers reach for senior-ml-engineer when building LLM-powered APIs or agents and need architecture decisions that survive production load rather than prototype-quality integration code.

Production-First Design principles covering scalability to 10x load, 99.9% uptime, maintainability and full observabilit
Performance by Design with efficient algorithms, strategic caching, batch processing and resource awareness
Security & Privacy built-in including input validation, data encryption, access control and audit logging
Advanced patterns for distributed processing, real-time low-latency systems and ML at scale with monitoring
Reliability practices including retries, circuit breakers, design for failure and continuous health monitoring

Senior Ml Engineer by the numbers

799 all-time installs (skills.sh)
+19 installs in the week ending Jul 27, 2026 (Skillselion tracking)
Ranked #1,310 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/davila7/claude-code-templates --skill senior-ml-engineer

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/davila7/claude-code-templates/senior-ml-engineer.svg)](https://skillselion.com/skills/davila7/claude-code-templates/senior-ml-engineer)

Installs	799
repo stars	★ 29.9k
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	davila7/claude-code-templates ↗

How do you build production-grade LLM integrations?

Get world-class LLM integration patterns, production-first MLOps guidance, and senior ML engineer standards when building AI features or agents.

Who is it for?

Backend and ML engineers shipping LLM-powered APIs or agents who need senior-level production, security, and scalability standards beyond prototype integrations.

Skip if: Beginners learning basic API calls or teams running throwaway prototypes with no production reliability, security, or observability requirements.

When should I use this skill?

User builds LLM features, asks about production MLOps, needs AI backend architecture, or wants senior ML engineer integration standards.

What you get

Production-ready LLM architecture decisions, security controls, observability hooks, caching strategies, and distributed processing patterns for AI backends.

LLM integration architecture
production security checklist

By the numbers

Targets 10x scalability headroom and 99.9% uptime reliability for production AI systems

Files

SKILL.mdMarkdownGitHub ↗

Senior ML/AI Engineer

World-class senior ml/ai engineer skill for production-grade AI/ML/Data systems.

Quick Start

Main Capabilities

# Core Tool 1
python scripts/model_deployment_pipeline.py --input data/ --output results/

# Core Tool 2  
python scripts/rag_system_builder.py --target project/ --analyze

# Core Tool 3
python scripts/ml_monitoring_suite.py --config config.yaml --deploy

Core Expertise

This skill covers world-class capabilities in:

Advanced production patterns and architectures
Scalable system design and implementation
Performance optimization at scale
MLOps and DataOps best practices
Real-time processing and inference
Distributed computing frameworks
Model deployment and monitoring
Security and compliance
Cost optimization
Team leadership and mentoring

Tech Stack

Languages: Python, SQL, R, Scala, Go ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost Data Tools: Spark, Airflow, dbt, Kafka, Databricks LLM Frameworks: LangChain, LlamaIndex, DSPy Deployment: Docker, Kubernetes, AWS/GCP/Azure Monitoring: MLflow, Weights & Biases, Prometheus Databases: PostgreSQL, BigQuery, Snowflake, Pinecone

Reference Documentation

1. Mlops Production Patterns

Comprehensive guide available in references/mlops_production_patterns.md covering:

Advanced patterns and best practices
Production implementation strategies
Performance optimization techniques
Scalability considerations
Security and compliance
Real-world case studies

2. Llm Integration Guide

Complete workflow documentation in references/llm_integration_guide.md including:

Step-by-step processes
Architecture design patterns
Tool integration guides
Performance tuning strategies
Troubleshooting procedures

3. Rag System Architecture

Technical reference guide in references/rag_system_architecture.md with:

System design principles
Implementation examples
Configuration best practices
Deployment strategies
Monitoring and observability

Production Patterns

Pattern 1: Scalable Data Processing

Enterprise-scale data processing with distributed computing:

Horizontal scaling architecture
Fault-tolerant design
Real-time and batch processing
Data quality validation
Performance monitoring

Pattern 2: ML Model Deployment

Production ML system with high availability:

Model serving with low latency
A/B testing infrastructure
Feature store integration
Model monitoring and drift detection
Automated retraining pipelines

Pattern 3: Real-Time Inference

High-throughput inference system:

Batching and caching strategies
Load balancing
Auto-scaling
Latency optimization
Cost optimization

Best Practices

Development

Test-driven development
Code reviews and pair programming
Documentation as code
Version control everything
Continuous integration

Production

Monitor everything critical
Automate deployments
Feature flags for releases
Canary deployments
Comprehensive logging

Team Leadership

Mentor junior engineers
Drive technical decisions
Establish coding standards
Foster learning culture
Cross-functional collaboration

Performance Targets

Latency:

P50: < 50ms
P95: < 100ms
P99: < 200ms

Throughput:

Requests/second: > 1000
Concurrent users: > 10,000

Availability:

Uptime: 99.9%
Error rate: < 0.1%

Security & Compliance

Authentication & authorization
Data encryption (at rest & in transit)
PII handling and anonymization
GDPR/CCPA compliance
Regular security audits
Vulnerability management

Common Commands

# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/

# Training
python scripts/train.py --config prod.yaml
python scripts/evaluate.py --model best.pth

# Deployment
docker build -t service:v1 .
kubectl apply -f k8s/
helm upgrade service ./charts/

# Monitoring
kubectl logs -f deployment/service
python scripts/health_check.py

Resources

Advanced Patterns: references/mlops_production_patterns.md
Implementation Guide: references/llm_integration_guide.md
Technical Reference: references/rag_system_architecture.md
Automation Scripts: scripts/ directory

Senior-Level Responsibilities

As a world-class senior professional:

1. Technical Leadership

Drive architectural decisions
Mentor team members
Establish best practices
Ensure code quality

2. Strategic Thinking

Align with business goals
Evaluate trade-offs
Plan for scale
Manage technical debt

3. Collaboration

Work across teams
Communicate effectively
Build consensus
Share knowledge

4. Innovation

Stay current with research
Experiment with new approaches
Contribute to community
Drive continuous improvement

5. Production Excellence

Ensure high availability
Monitor proactively
Optimize performance
Respond to incidents

#!/usr/bin/env python3
"""
Ml Monitoring Suite
Production-grade tool for senior ml/ai engineer
"""

import os
import sys
import json
import logging
import argparse
from pathlib import Path
from typing import Dict, List, Optional
from datetime import datetime

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class MlMonitoringSuite:
    """Production-grade ml monitoring suite"""
    
    def __init__(self, config: Dict):
        self.config = config
        self.results = {
            'status': 'initialized',
            'start_time': datetime.now().isoformat(),
            'processed_items': 0
        }
        logger.info(f"Initialized {self.__class__.__name__}")
    
    def validate_config(self) -> bool:
        """Validate configuration"""
        logger.info("Validating configuration...")
        # Add validation logic
        logger.info("Configuration validated")
        return True
    
    def process(self) -> Dict:
        """Main processing logic"""
        logger.info("Starting processing...")
        
        try:
            self.validate_config()
            
            # Main processing
            result = self._execute()
            
            self.results['status'] = 'completed'
            self.results['end_time'] = datetime.now().isoformat()
            
            logger.info("Processing completed successfully")
            return self.results
            
        except Exception as e:
            self.results['status'] = 'failed'
            self.results['error'] = str(e)
            logger.error(f"Processing failed: {e}")
            raise
    
    def _execute(self) -> Dict:
        """Execute main logic"""
        # Implementation here
        return {'success': True}

def main():
    """Main entry point"""
    parser = argparse.ArgumentParser(
        description="Ml Monitoring Suite"
    )
    parser.add_argument('--input', '-i', required=True, help='Input path')
    parser.add_argument('--output', '-o', required=True, help='Output path')
    parser.add_argument('--config', '-c', help='Configuration file')
    parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output')
    
    args = parser.parse_args()
    
    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)
    
    try:
        config = {
            'input': args.input,
            'output': args.output
        }
        
        processor = MlMonitoringSuite(config)
        results = processor.process()
        
        print(json.dumps(results, indent=2))
        sys.exit(0)
        
    except Exception as e:
        logger.error(f"Fatal error: {e}")
        sys.exit(1)

if __name__ == '__main__':
    main()

#!/usr/bin/env python3
"""
Model Deployment Pipeline
Production-grade tool for senior ml/ai engineer
"""

import os
import sys
import json
import logging
import argparse
from pathlib import Path
from typing import Dict, List, Optional
from datetime import datetime

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class ModelDeploymentPipeline:
    """Production-grade model deployment pipeline"""
    
    def __init__(self, config: Dict):
        self.config = config
        self.results = {
            'status': 'initialized',
            'start_time': datetime.now().isoformat(),
            'processed_items': 0
        }
        logger.info(f"Initialized {self.__class__.__name__}")
    
    def validate_config(self) -> bool:
        """Validate configuration"""
        logger.info("Validating configuration...")
        # Add validation logic
        logger.info("Configuration validated")
        return True
    
    def process(self) -> Dict:
        """Main processing logic"""
        logger.info("Starting processing...")
        
        try:
            self.validate_config()
            
            # Main processing
            result = self._execute()
            
            self.results['status'] = 'completed'
            self.results['end_time'] = datetime.now().isoformat()
            
            logger.info("Processing completed successfully")
            return self.results
            
        except Exception as e:
            self.results['status'] = 'failed'
            self.results['error'] = str(e)
            logger.error(f"Processing failed: {e}")
            raise
    
    def _execute(self) -> Dict:
        """Execute main logic"""
        # Implementation here
        return {'success': True}

def main():
    """Main entry point"""
    parser = argparse.ArgumentParser(
        description="Model Deployment Pipeline"
    )
    parser.add_argument('--input', '-i', required=True, help='Input path')
    parser.add_argument('--output', '-o', required=True, help='Output path')
    parser.add_argument('--config', '-c', help='Configuration file')
    parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output')
    
    args = parser.parse_args()
    
    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)
    
    try:
        config = {
            'input': args.input,
            'output': args.output
        }
        
        processor = ModelDeploymentPipeline(config)
        results = processor.process()
        
        print(json.dumps(results, indent=2))
        sys.exit(0)
        
    except Exception as e:
        logger.error(f"Fatal error: {e}")
        sys.exit(1)

if __name__ == '__main__':
    main()

#!/usr/bin/env python3
"""
Rag System Builder
Production-grade tool for senior ml/ai engineer
"""

import os
import sys
import json
import logging
import argparse
from pathlib import Path
from typing import Dict, List, Optional
from datetime import datetime

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

class RagSystemBuilder:
    """Production-grade rag system builder"""
    
    def __init__(self, config: Dict):
        self.config = config
        self.results = {
            'status': 'initialized',
            'start_time': datetime.now().isoformat(),
            'processed_items': 0
        }
        logger.info(f"Initialized {self.__class__.__name__}")
    
    def validate_config(self) -> bool:
        """Validate configuration"""
        logger.info("Validating configuration...")
        # Add validation logic
        logger.info("Configuration validated")
        return True
    
    def process(self) -> Dict:
        """Main processing logic"""
        logger.info("Starting processing...")
        
        try:
            self.validate_config()
            
            # Main processing
            result = self._execute()
            
            self.results['status'] = 'completed'
            self.results['end_time'] = datetime.now().isoformat()
            
            logger.info("Processing completed successfully")
            return self.results
            
        except Exception as e:
            self.results['status'] = 'failed'
            self.results['error'] = str(e)
            logger.error(f"Processing failed: {e}")
            raise
    
    def _execute(self) -> Dict:
        """Execute main logic"""
        # Implementation here
        return {'success': True}

def main():
    """Main entry point"""
    parser = argparse.ArgumentParser(
        description="Rag System Builder"
    )
    parser.add_argument('--input', '-i', required=True, help='Input path')
    parser.add_argument('--output', '-o', required=True, help='Output path')
    parser.add_argument('--config', '-c', help='Configuration file')
    parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output')
    
    args = parser.parse_args()
    
    if args.verbose:
        logging.getLogger().setLevel(logging.DEBUG)
    
    try:
        config = {
            'input': args.input,
            'output': args.output
        }
        
        processor = RagSystemBuilder(config)
        results = processor.process()
        
        print(json.dumps(results, indent=2))
        sys.exit(0)
        
    except Exception as e:
        logger.error(f"Fatal error: {e}")
        sys.exit(1)

if __name__ == '__main__':
    main()

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use senior-ml-engineer for production architecture and MLOps standards; use a narrower SDK skill when you only need API syntax for a specific LLM provider.

FAQ

What production targets does senior-ml-engineer recommend?

senior-ml-engineer recommends designing AI systems for 10x current load scalability, 99.9% uptime reliability, full observability monitoring, and maintainable documented code from the initial LLM integration phase.

What security practices does senior-ml-engineer cover?

senior-ml-engineer covers input validation, data encryption, access control, and audit logging as baseline security for LLM-powered backends and agent systems entering production environments.

Is Senior Ml Engineer safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingagentsllmautomation

About

Senior Ml Engineer by the numbers

Add your badge

How do you build production-grade LLM integrations?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Senior ML/AI Engineer

Quick Start

Main Capabilities

Core Expertise

Tech Stack

Reference Documentation

1. Mlops Production Patterns

2. Llm Integration Guide

3. Rag System Architecture

Production Patterns

Pattern 1: Scalable Data Processing

Pattern 2: ML Model Deployment

Pattern 3: Real-Time Inference

Best Practices

Development

Production

Team Leadership

Performance Targets

Security & Compliance

Common Commands

Resources

Senior-Level Responsibilities

Llm Integration Guide

Overview

Core Principles

Production-First Design

Performance by Design

Security & Privacy

Advanced Patterns

Pattern 1: Distributed Processing

Pattern 2: Real-Time Systems

Pattern 3: ML at Scale

Best Practices

Code Quality

Performance

Reliability

Tools & Technologies

Further Reading

Mlops Production Patterns

Overview

Core Principles

Production-First Design

Performance by Design

Security & Privacy

Advanced Patterns

Pattern 1: Distributed Processing

Pattern 2: Real-Time Systems

Pattern 3: ML at Scale

Best Practices

Code Quality

Performance

Reliability

Tools & Technologies

Further Reading

Rag System Architecture

Overview

Core Principles

Production-First Design

Performance by Design

Security & Privacy

Advanced Patterns

Pattern 1: Distributed Processing

Pattern 2: Real-Time Systems

Pattern 3: ML at Scale

Best Practices

Code Quality

Performance

Reliability

Tools & Technologies

Further Reading

Related skills