
Agent Evaluation
Benchmark agent tool-use, reasoning, and task completion before release; compare variants, regressions, and prompt changes with repeatable eval suites.
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill agent-evaluation| Installs | 122 |
|---|---|
| Repository | guia-matthieu/clawfu-skills ↗ |