
Mcp Server
Score agent and MCP tool outputs for quality, safety, and cost before you ship prompts or workflows to production.
Overview
io.github.iris-eval/mcp-server is an MCP server for the Ship phase that scores agent outputs for quality, safety, and cost using a consistent eval standard.
What is this MCP server?
- Positions itself as an agent eval standard wired for MCP workflows
- Scores every agent output on quality, safety, and cost dimensions
- stdio npm package @iris-eval/mcp-server v0.4.2 with optional IRIS_API_KEY
- Persists runs via configurable IRIS_DB_PATH SQLite and IRIS_LOG_LEVEL
- Fits CI or pre-ship checks on agent replies and tool results
- Package version 0.4.2 (@iris-eval/mcp-server, stdio)
- Optional secret env IRIS_API_KEY; configurable IRIS_DB_PATH and IRIS_LOG_LEVEL
- Repository: github.com/iris-eval/mcp-server
Community signal: 7 GitHub stars.
What problem does it solve?
You cannot tell whether agent changes actually improved replies, stayed safe, or blew the budget without a repeatable scoring layer on every output.
Who is it for?
Indie builders shipping agent features or MCP-heavy workflows who want SQLite-backed eval history and optional API-key auth before promoting prompts to users.
Skip if: Teams that only need manual code review with no LLM output scoring, or builders still in pure ideation with no agent pipeline to measure.
What do I get? / Deliverables
After you register the server, your agent can run structured eval passes and store scores locally so shipping decisions rest on measured quality, safety, and cost—not vibes.
- Per-output quality, safety, and cost scores via MCP tools
- Local eval history when IRIS_DB_PATH SQLite is configured
- Log-level controlled runs via IRIS_LOG_LEVEL
Recommended MCP Servers
Journey fit
Output scoring and eval gates belong on the ship shelf because they validate behavior right before or after release, not while you are still drafting product ideas. Testing is the canonical subphase for systematic pass/fail scoring of model outputs rather than one-off debugging in operate.
How it compares
MCP eval scoring server, not a prompt-writing or brainstorming skill.
Common Questions / FAQ
Who is io.github.iris-eval/mcp-server for?
Solo and small-team builders who ship AI agents and want MCP-native tools to grade output quality, safety, and cost before release.
When should I use io.github.iris-eval/mcp-server?
Use it during ship and testing when you compare prompt versions, regression fixes, or new tools and need comparable scores on every run.
How do I add io.github.iris-eval/mcp-server to my agent?
Add the stdio MCP entry for npm package @iris-eval/mcp-server v0.4.2 in Claude Code or Cursor, set IRIS_API_KEY if required, and configure IRIS_DB_PATH and IRIS_LOG_LEVEL as needed.