Agentic Engineering

Name: Agentic Engineering
Author: affaan-m

affaan-m/everything-claude-code

Run eval-first agent workflows with 15-minute task units, tiered model routing, and focused review of AI-generated code.

Overview

agentic-engineering is a journey-wide agent skill that structures eval-first execution, decomposition, and cost-aware model routing—usable whenever a solo builder needs to govern agent implementation before committing to

Install

npx skills add https://github.com/affaan-m/everything-claude-code --skill agentic-engineering

What is this skill?

Four operating principles: completion criteria, decomposition, model routing, eval measurement
Eval-first loop: capability eval, regression eval, baseline, implementation, delta comparison
15-minute unit rule with verifiable done conditions and single dominant risk per unit
Model routing map: Haiku, Sonnet, Opus by task complexity
Review focus on invariants, error boundaries, auth, and rollout risk—not style nitpicks
15-minute unit rule for agent-sized work decomposition
4-step eval-first loop from baseline through delta comparison
4 operating principles before execution

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 4.8k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

Agents ship code faster than you can verify, so work balloons without clear done conditions, eval baselines, or review priorities.

Who is it for?

Solo builders running Claude Code or similar agents on multi-step features who want measurable quality gates and token discipline.

Skip if: One-off copy edits or tasks already covered by an approved spec with no agent delegation.

When should I use this skill?

Engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

What do I get? / Deliverables

You run baseline and post-change evals, execute in verifiable units with routed models, and review AI code for risk-bearing defects instead of style noise.

Completion criteria and eval baselines
Decomposed agent work units with done conditions
Risk-prioritized review notes for agent-generated changes

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

ValidateScope & plan

Define capability eval and done criteria before an agent builds a landing-page prototype.

Example use

BuildProject management & tracking

Split a multi-file feature into fifteen-minute units each with one dominant risk.

Example use

ShipCode review

Prioritize auth boundaries and rollout risk in an agent-generated PR review pass.

Example use

OperateError tracking

Start a fresh session for root-cause analysis and route the investigation to a stronger model tier.

How it compares

Process methodology for agent-led engineering—not a single-purpose generator or Laravel security checklist.

Common Questions / FAQ

Who is agentic-engineering for?

Indie developers and small teams treating AI agents as primary implementers while they own evals, routing, and risk review.

When should I use agentic-engineering?

Use in Validate when scoping agent-sized prototypes, in Build and Ship when decomposing features and regression-testing agent diffs, in Grow when automating workflows, and in Operate when debugging with fresh sessions and root-cause routing to stronger models.

Is agentic-engineering safe to install?

It guides process and review focus only; check the Security Audits panel on this Prism page for the underlying skill package before enabling it in production repos.

SKILL.md

READMESKILL.md - Agentic Engineering

# Agentic Engineering

Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

## Operating Principles

1. Define completion criteria before execution.
2. Decompose work into agent-sized units.
3. Route model tiers by task complexity.
4. Measure with evals and regression checks.

## Eval-First Loop

1. Define capability eval and regression eval.
2. Run baseline and capture failure signatures.
3. Execute implementation.
4. Re-run evals and compare deltas.

## Task Decomposition

Apply the 15-minute unit rule:
- each unit should be independently verifiable
- each unit should have a single dominant risk
- each unit should expose a clear done condition

## Model Routing

- Haiku: classification, boilerplate transforms, narrow edits
- Sonnet: implementation and refactors
- Opus: architecture, root-cause analysis, multi-file invariants

## Session Strategy

- Continue session for closely-coupled units.
- Start fresh session after major phase transitions.
- Compact after milestone completion, not during active debugging.

## Review Focus for AI-Generated Code

Prioritize:
- invariants and edge cases
- error boundaries
- security and auth assumptions
- hidden coupling and rollout risk

Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.

## Cost Discipline

Track per task:
- model
- token estimate
- retries
- wall-clock time
- success/failure

Escalate model tier only when lower tier fails with a clear reasoning gap.

What is this skill?

Four operating principles: completion criteria, decomposition, model routing, eval measurement

Eval-first loop: capability eval, regression eval, baseline, implementation, delta comparison

15-minute unit rule with verifiable done conditions and single dominant risk per unit

Model routing map: Haiku, Sonnet, Opus by task complexity

Review focus on invariants, error boundaries, auth, and rollout risk—not style nitpicks

15-minute unit rule for agent-sized work decomposition

4-step eval-first loop from baseline through delta comparison

4 operating principles before execution

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 4.8k installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Useful at every journey phase - explore requirements and options before committing to a direction.

Where it fits

Example use

ValidateScope & plan

Define capability eval and done criteria before an agent builds a landing-page prototype.

Example use

BuildProject management & tracking

Split a multi-file feature into fifteen-minute units each with one dominant risk.

Example use

ShipCode review

Prioritize auth boundaries and rollout risk in an agent-generated PR review pass.

Example use

OperateError tracking

Start a fresh session for root-cause analysis and route the investigation to a stronger model tier.

SKILL.md

READMESKILL.md - Agentic Engineering

# Agentic Engineering

Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

## Operating Principles

1. Define completion criteria before execution.
2. Decompose work into agent-sized units.
3. Route model tiers by task complexity.
4. Measure with evals and regression checks.

## Eval-First Loop

1. Define capability eval and regression eval.
2. Run baseline and capture failure signatures.
3. Execute implementation.
4. Re-run evals and compare deltas.

## Task Decomposition

Apply the 15-minute unit rule:
- each unit should be independently verifiable
- each unit should have a single dominant risk
- each unit should expose a clear done condition

## Model Routing

- Haiku: classification, boilerplate transforms, narrow edits
- Sonnet: implementation and refactors
- Opus: architecture, root-cause analysis, multi-file invariants

## Session Strategy

- Continue session for closely-coupled units.
- Start fresh session after major phase transitions.
- Compact after milestone completion, not during active debugging.

## Review Focus for AI-Generated Code

Prioritize:
- invariants and edge cases
- error boundaries
- security and auth assumptions
- hidden coupling and rollout risk

Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.

## Cost Discipline

Track per task:
- model
- token estimate
- retries
- wall-clock time
- success/failure

Escalate model tier only when lower tier fails with a clear reasoning gap.

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is agentic-engineering for?

When should I use agentic-engineering?

Is agentic-engineering safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is agentic-engineering for?

When should I use agentic-engineering?

Is agentic-engineering safe to install?

SKILL.md