
Evals
Define datasets, graders, and regression suites to measure agent prompt, tool, and end-to-end behavior before promoting changes.
npx skills add https://github.com/danielmiessler/personal_ai_infrastructure --skill evals| Installs | 113 |
|---|---|
| Repository | danielmiessler/personal_ai_infrastructure ↗ |