
Ai Eval
Run AI evaluation workflows from your coding agent via a hosted Cloudflare Workers MCP endpoint without wiring custom eval scripts locally.
Overview
io.github.lazymac2x/ai-eval is a MCP server for the Ship phase that exposes Cloudflare Workers–hosted AI evaluation tools to your agent over streamable HTTP.
What is this MCP server?
- Remote streamable-http MCP at https://api.lazy-mac.com/ai-eval/mcp (no local worker deploy required to connect)
- Server schema version 1.0.0 with GitHub source at lazymac2x/ai-eval-api
- Cloudflare Workers-hosted MCP bridge so Claude Code, Cursor, and Codex can call eval tooling as tools
- Fits solo builders who need repeatable AI output checks without building a separate eval dashboard first
- Pairs with gateway, guardrails, and model-router in the same lazy-mac MCP suite for end-to-end agent stacks
- MCP server version 1.0.0
- 1 remote endpoint (streamable-http)
- Hosted on Cloudflare Workers via api.lazy-mac.com
What problem does it solve?
You cannot trust an agent workflow you only eyeballed once, and you do not want to build and host a custom eval API just to score prompts and outputs from Claude Code or Cursor.
Who is it for?
Indie builders with MCP-capable agents who want a remote eval endpoint tied into ship-time testing without operating their own Workers project first.
Skip if: Teams that need private on-prem eval pipelines, large labeled datasets, or detailed offline reporting with no external HTTP dependency.
What do I get? / Deliverables
After you register the remote MCP URL, your agent can invoke ai-eval tools on demand so testing and iteration cycles include structured evaluation instead of ad-hoc guesses.
- Registered remote ai-eval MCP tools visible in your agent
- Agent-invokable evaluation calls against the lazy-mac Workers API
- Repeatable eval step you can run before ship and after prompt changes
Recommended MCP Servers
Journey fit
Canonical shelf is Ship because measuring model and prompt quality belongs with testing and release confidence, even when you also invoke eval during build and post-launch iteration. Testing is the primary fit: ai-eval is for benchmarking outputs, regression checks, and pass-rate style validation before you treat an agent workflow as production-ready.
How it compares
Remote MCP integration for AI evaluation, not an in-repo agent skill or a full local benchmark framework.
Common Questions / FAQ
Who is io.github.lazymac2x/ai-eval for?
Solo and indie builders using Claude Code, Cursor, Codex, or similar agents who want hosted AI eval tools reachable through standard MCP configuration.
When should I use io.github.lazymac2x/ai-eval?
Use it during Ship testing and whenever you change prompts, models, or tool chains and need your agent to run evaluation calls before you ship or iterate in production.
How do I add io.github.lazymac2x/ai-eval to my agent?
Add a remote MCP server entry pointing at https://api.lazy-mac.com/ai-eval/mcp with type streamable-http in your client’s MCP config, then restart or reload the agent so tools from ai-eval appear.