
Senior Data Scientist
Apply production-minded experiment design, feature engineering, and ML-at-scale patterns when planning or building data products solo.
Overview
Senior Data Scientist is an agent skill most often used in Validate (also Build, Operate) that applies production-first experiment design and feature engineering frameworks for solo data-product work.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill senior-data-scientistWhat is this skill?
- Production-first experiment and feature-engineering frameworks with scalability, reliability, and observability baked in
- Three advanced patterns: distributed processing, real-time systems, and production ML with monitoring
- Security and privacy checklist: validation, encryption, access control, audit logging
- Performance-by-design guidance: profiling, caching, batching, and continuous monitoring
- Code-quality bar: comprehensive testing, documentation, reviews, and type hints
- 3 advanced patterns: distributed processing, real-time systems, ML at scale
- 99.9% uptime and 10x load design targets in framework
Adoption & trust: 2.7k installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are building ML or analytics features without a senior-style plan for experiments, features, scale, or production monitoring.
Who is it for?
Solo builders scoping A/B tests, feature stores, or ML pipelines who want enterprise-grade design habits without hiring a full data team.
Skip if: Quick one-off plots, pure research with no ship path, or teams that already have locked experiment playbooks and MLOps platforms.
When should I use this skill?
User is planning experiments, feature engineering, or production ML architecture and needs senior data scientist framing.
What do I get? / Deliverables
You leave with structured experiment and feature-engineering guidance aligned to scalable, observable, secure production ML instead of ad-hoc notebook work.
- Experiment and feature-engineering design aligned to production constraints
- Documented patterns for scale, security, and reliability
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Experiment design and hypothesis framing belong on the validate shelf because builders prove what to measure before committing engineering to pipelines and models. Scope is where you define metrics, power, and rollout constraints—not after code is already in production.
Where it fits
Define success metrics and experiment guardrails before coding a recommendation feature.
Choose batch vs real-time feature paths and caching strategy for an API.
Profile and batch operations before launch based on performance-by-design rules.
Align health checks, retries, and circuit breakers with production ML monitoring habits.
How it compares
Use as a senior DS playbook in chat—not a hosted experiment platform or AutoML product.
Common Questions / FAQ
Who is senior-data-scientist for?
Indie and solo builders shipping SaaS, APIs, or agent features that depend on experiments, features, or production ML who need structured senior-level framing.
When should I use senior-data-scientist?
During validate when scoping metrics and experiments; during build when designing features and ML architecture; during operate when emphasizing monitoring, retries, and failure-aware design.
Is senior-data-scientist safe to install?
Treat it as procedural guidance in your repo—review the Security Audits panel on this Prism page before trusting any third-party skill package.
SKILL.md
READMESKILL.md - Senior Data Scientist
# Experiment Design Frameworks ## Overview World-class experiment design frameworks for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Feature Engineering Patterns ## Overview World-class feature engineering patterns for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Statistical Methods Advanced ## Overview World-class statistical methods advanced for senior data scientist. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure