
Senior Data Scientist
Apply senior-level experiment design, feature engineering, and production ML patterns when building or hardening data products with an agent.
Overview
Senior Data Scientist is an agent skill most often used in Build (also Validate/scope, Operate/monitoring) that guides experiment design, feature engineering, and production ML architecture for scalable data systems.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill senior-data-scientistWhat is this skill?
- Production-first design: scalability to 10x load, 99.9% uptime targets, observability, and maintainability
- Experiment design frameworks aligned with rigorous A/B and measurement practice
- Feature engineering pattern catalog for reusable transformation strategies
- Advanced patterns: distributed processing, real-time low-latency systems, ML at scale with monitoring
- Reliability practices: retries, circuit breakers, health monitoring, and security-by-design controls
- 99.9% uptime target cited in production-first design
- 10x load scalability framing in core principles
Adoption & trust: 774 installs on skills.sh; 17.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are building ML or analytics features but lack a senior playbook for experiments, features, and production reliability.
Who is it for?
Indie builders and small teams adding ML, experimentation, or data pipelines who want enterprise-grade design language in agent sessions.
Skip if: Pure dashboard tweaks with no modeling, or beginners who only need a single sklearn tutorial without production scope.
When should I use this skill?
Planning or implementing experiments, feature engineering, or production ML systems that need senior-level architecture discipline.
What do I get? / Deliverables
You get structured frameworks for designing experiments, engineering features, and shipping observable, secure ML systems sized for real traffic.
- Experiment design outline
- Feature engineering approach
- Production ML architecture notes with reliability and security controls
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build/backend is the primary shelf because the skill emphasizes implementing scalable pipelines, features, and production ML—not just ideation slides. Backend subphase covers model serving, distributed processing, and data paths that solo builders implement after validation.
Where it fits
Define an A/B experiment with power assumptions before committing engineering weeks.
Design a feature store and batch pipeline with caching and batch operations called out in the skill.
Add health checks, circuit breakers, and continuous performance monitoring to a live model endpoint.
How it compares
Methodology and pattern library for data science delivery—not a one-click model trainer or a database MCP connector.
Common Questions / FAQ
Who is senior-data-scientist for?
Solo founders and small teams treating data science as a product surface—experiments, features, and deployed models—not a one-off notebook.
When should I use senior-data-scientist?
In Validate/scope when framing experiments; in Build/backend when designing features and ML services; in Operate/monitoring when hardening observability and failure handling.
Is senior-data-scientist safe to install?
It is documentation-style guidance; review the Security Audits panel on this page before pointing agents at production data or secrets.
SKILL.md
READMESKILL.md - Senior Data Scientist
# Experiment Design Frameworks ## Overview World-class experiment design frameworks for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Feature Engineering Patterns ## Overview World-class feature engineering patterns for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Statistical Methods Advanced ## Overview World-class statistical methods advanced for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure