
Agentic Engineering
Run eval-first agent workflows with 15-minute task units, tiered model routing, and focused review of AI-generated code.
Overview
agentic-engineering is a journey-wide agent skill that structures eval-first execution, decomposition, and cost-aware model routing—usable whenever a solo builder needs to govern agent implementation before committing to
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill agentic-engineeringWhat is this skill?
- Four operating principles: completion criteria, decomposition, model routing, eval measurement
- Eval-first loop: capability eval, regression eval, baseline, implementation, delta comparison
- 15-minute unit rule with verifiable done conditions and single dominant risk per unit
- Model routing map: Haiku, Sonnet, Opus by task complexity
- Review focus on invariants, error boundaries, auth, and rollout risk—not style nitpicks
- 15-minute unit rule for agent-sized work decomposition
- 4-step eval-first loop from baseline through delta comparison
- 4 operating principles before execution
Adoption & trust: 4.8k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Agents ship code faster than you can verify, so work balloons without clear done conditions, eval baselines, or review priorities.
Who is it for?
Solo builders running Claude Code or similar agents on multi-step features who want measurable quality gates and token discipline.
Skip if: One-off copy edits or tasks already covered by an approved spec with no agent delegation.
When should I use this skill?
Engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
What do I get? / Deliverables
You run baseline and post-change evals, execute in verifiable units with routed models, and review AI code for risk-bearing defects instead of style noise.
- Completion criteria and eval baselines
- Decomposed agent work units with done conditions
- Risk-prioritized review notes for agent-generated changes
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Define capability eval and done criteria before an agent builds a landing-page prototype.
Split a multi-file feature into fifteen-minute units each with one dominant risk.
Prioritize auth boundaries and rollout risk in an agent-generated PR review pass.
Start a fresh session for root-cause analysis and route the investigation to a stronger model tier.
How it compares
Process methodology for agent-led engineering—not a single-purpose generator or Laravel security checklist.
Common Questions / FAQ
Who is agentic-engineering for?
Indie developers and small teams treating AI agents as primary implementers while they own evals, routing, and risk review.
When should I use agentic-engineering?
Use in Validate when scoping agent-sized prototypes, in Build and Ship when decomposing features and regression-testing agent diffs, in Grow when automating workflows, and in Operate when debugging with fresh sessions and root-cause routing to stronger models.
Is agentic-engineering safe to install?
It guides process and review focus only; check the Security Audits panel on this Prism page for the underlying skill package before enabling it in production repos.
SKILL.md
READMESKILL.md - Agentic Engineering
# Agentic Engineering Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls. ## Operating Principles 1. Define completion criteria before execution. 2. Decompose work into agent-sized units. 3. Route model tiers by task complexity. 4. Measure with evals and regression checks. ## Eval-First Loop 1. Define capability eval and regression eval. 2. Run baseline and capture failure signatures. 3. Execute implementation. 4. Re-run evals and compare deltas. ## Task Decomposition Apply the 15-minute unit rule: - each unit should be independently verifiable - each unit should have a single dominant risk - each unit should expose a clear done condition ## Model Routing - Haiku: classification, boilerplate transforms, narrow edits - Sonnet: implementation and refactors - Opus: architecture, root-cause analysis, multi-file invariants ## Session Strategy - Continue session for closely-coupled units. - Start fresh session after major phase transitions. - Compact after milestone completion, not during active debugging. ## Review Focus for AI-Generated Code Prioritize: - invariants and edge cases - error boundaries - security and auth assumptions - hidden coupling and rollout risk Do not waste review cycles on style-only disagreements when automated format/lint already enforce style. ## Cost Discipline Track per task: - model - token estimate - retries - wall-clock time - success/failure Escalate model tier only when lower tier fails with a clear reasoning gap.