
Agent Evaluation
Evaluate LLM agent systems with multi-dimensional rubrics, LLM-as-judge with bias mitigation, pairwise comparison, direct scoring, and continuous monitoring.
Install
npx skills add https://github.com/viktorbezdek/skillstack --skill agent-evaluationWhat is this skill?
- Multi-dimensional rubrics
- LLM-as-judge with bias mitigation
- Pairwise and direct scoring
- Confidence calibration
Adoption & trust: 9 GitHub stars.
Recommended Skills
Microsoft Foundrymicrosoft/azure-skills
Azure Aimicrosoft/azure-skills
Azure Hosted Copilot Sdkmicrosoft/azure-skills
Lark Eventlarksuite/cli
Running Claude Code Via Litellm Copilotxixu-me/skills
Setup Matt Pocock Skillsmattpocock/skills