
Senior Data Engineer
Apply production-grade data modeling, pipeline architecture, and scale patterns when a solo builder designs warehouses, ETL, or real-time analytics beyond a single-database app.
Overview
Senior-data-engineer is an agent skill most often used in Build (also Operate and Grow) that encodes senior-level data modeling, pipeline architecture, and scale patterns for production analytics and ML systems.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill senior-data-engineerWhat is this skill?
- Production-first design: scalability to 10x load, 99.9% uptime target, observability, and documented maintainability
- Advanced patterns for distributed processing, real-time low-latency systems, and ML at scale with monitoring
- Security-by-design: validation, encryption, access control, and audit logging
- Reliability toolkit: retries, circuit breakers, health monitoring, strategic caching, and batch operations
- Code-quality bar: comprehensive testing, type hints, reviews, and profile-before-optimize performance discipline
- Production-first design cites a 99.9% uptime target and planning for 10x current load
- Three advanced pattern areas: distributed processing, real-time systems, and ML at scale
Adoption & trust: 1.3k installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need warehouse or streaming pipelines that will not collapse under growth, but your agent defaults to toy schemas without observability, security, or failure handling.
Who is it for?
Indie SaaS or API products planning analytics, ETL, or ML features that must survive real load and compliance expectations.
Skip if: Static marketing sites, single-table prototypes, or builders who only need copy-paste SQL without architectural tradeoffs.
When should I use this skill?
Designing or reviewing data models, ETL/streaming pipelines, distributed jobs, or production ML data paths that must meet scale and reliability bars.
What do I get? / Deliverables
Your agent proposes scalable pipeline and modeling patterns with testing, monitoring, and resilience baked in so data paths are ready for production iteration.
- Documented data model and pipeline architecture
- Operational checklist for testing, monitoring, and failure handling
- Security and performance guardrails for batch and real-time paths
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build/backend because schemas and pipelines are created before the product ingests live traffic. Data models, distributed processing, and batch/real-time ingestion are backend and platform engineering work.
Where it fits
Define fact/dimension layers and batch ingestion before the app's first analytics dashboard ships.
Add health checks, retries, and circuit breakers after nightly ETL jobs start failing silently.
Scale a real-time metrics path when marketing needs sub-minute funnel visibility.
How it compares
Architecture and patterns playbook—not a turnkey dbt/Airflow MCP integration or one-click deploy skill.
Common Questions / FAQ
Who is senior-data-engineer for?
Solo builders wearing the data-engineer hat who want agent answers aligned with senior pipeline, modeling, and ML-ops practice rather than tutorial-level snippets.
When should I use senior-data-engineer?
In Build (backend) while designing schemas and pipelines; in Operate (infra/monitoring) when hardening retries and health checks; and in Grow (analytics) when scaling real-time or batch reporting paths.
Is senior-data-engineer safe to install?
Treat it as high-level guidance that may suggest shell, cloud, and secrets usage in your stack—review the Security Audits panel on this page and validate any generated infra against your policies.
SKILL.md
READMESKILL.md - Senior Data Engineer
# Data Modeling Patterns ## Overview World-class data modeling patterns for senior data engineer. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Data Pipeline Architecture ## Overview World-class data pipeline architecture for senior data engineer. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circuit breakers - Monitor health ## Tools & Technologies Essential tools for this domain: - Development frameworks - Testing libraries - Deployment platforms - Monitoring solutions ## Further Reading - Research papers - Industry blogs - Conference talks - Open source projects # Dataops Best Practices ## Overview World-class dataops best practices for senior data engineer. ## Core Principles ### Production-First Design Always design with production in mind: - Scalability: Handle 10x current load - Reliability: 99.9% uptime target - Maintainability: Clear, documented code - Observability: Monitor everything ### Performance by Design Optimize from the start: - Efficient algorithms - Resource awareness - Strategic caching - Batch processing ### Security & Privacy Build security in: - Input validation - Data encryption - Access control - Audit logging ## Advanced Patterns ### Pattern 1: Distributed Processing Enterprise-scale data processing with fault tolerance. ### Pattern 2: Real-Time Systems Low-latency, high-throughput systems. ### Pattern 3: ML at Scale Production ML with monitoring and automation. ## Best Practices ### Code Quality - Comprehensive testing - Clear documentation - Code reviews - Type hints ### Performance - Profile before optimizing - Monitor continuously - Cache strategically - Batch operations ### Reliability - Design for failure - Implement retries - Use circ