Hamel Husain avatar

Hamelsmu Evals Skills

Design and run LLM eval suites—datasets, rubrics, regression baselines, and failure triage—before shipping agent or prompt changes to production.

/plugin marketplace add hamelsmu/evals-skills
GitHub stars1.4k
Repositoryhamelsmu/evals-skills

Recommended Marketplaces

Testingllmresearch

This week for builders

Five minutes, every Monday — the tools, releases and tactics for shipping solo.

unsubscribe anytime.