
Evaluation
Define eval datasets, scoring rubrics, regression suites, and human-in-the-loop checks to measure agent and LLM output quality before release.
npx skills add https://github.com/shipshitdev/library --skill evaluation| Installs | 104 |
|---|---|
| Repository | shipshitdev/library ↗ |