
Model Evaluation Benchmark
Benchmark model and prompt variants on task suites, latency, cost, and regression thresholds before promoting releases.
npx skills add https://github.com/rysweet/amplihack --skill model-evaluation-benchmark| Installs | 142 |
|---|---|
| Repository | rysweet/amplihack ↗ |