Agent Evaluation

viktorbezdek/skillstack

Evaluate LLM agent systems with multi-dimensional rubrics, LLM-as-judge with bias mitigation, pairwise comparison, direct scoring, and continuous monitoring.

Install

npx skills add https://github.com/viktorbezdek/skillstack --skill agent-evaluation

What is this skill?

Multi-dimensional rubrics
LLM-as-judge with bias mitigation
Pairwise and direct scoring
Confidence calibration

Adoption & trust: 9 GitHub stars.

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…390k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …388k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …359k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…229k installs

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…213k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…206k installs·121k stars

Journey fit

Primary fit

Evaluating agent quality before and after release is the testing step of shipping an agent. Rubrics, LLM-as-judge, and scoring are testing techniques for validating agent behavior.

Install

What is this skill?

Recommended Skills

Journey fit

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit