Evalscopebench Mcp

Name: Evalscopebench Mcp
Author: clauxel

clauxel/evalscope-benchmark-mcp-mcp

Pull AI SDK benchmark dashboard results into your agent via MCP so you can compare runs, verdicts, and usage logs before you ship model or SDK changes.

Overview

EvalScope Benchmark MCP is an MCP server for the ship phase that exposes the AI SDK benchmark dashboard to agents with verdicts, receipts, usage logs, and audit-ready JSON.

What is this MCP server?

Paid remote MCP for the EvalScope AI SDK benchmark dashboard
Streamable HTTP remote at evalscopebench.clauxel.com/mcp with Bearer auth
Returns verdicts, receipts, usage logs, and audit-ready JSON from benchmark runs
Registry title EvalScope Benchmark MCP with server card URL on Clauxel
GitHub reference repo: clauxel/evalscope-benchmark-mcp-mcp
Version 1.0.0 in MCP registry schema
Remote URL: evalscopebench.clauxel.com/mcp (streamable-http)
Publisher tags: evalscopebench, ai-sdk-benchmark-dashboard, paid-mcp

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

What problem does it solve?

You cannot sanity-check AI SDK benchmark regressions from inside your coding agent without tab-switching or bespoke API glue.

Who is it for?

Solo builders running AI SDK benchmarks who want dashboard verdicts on demand inside Claude Code or Cursor during release testing.

Skip if: Teams without EvalScope access, offline-only eval workflows, or users who only need unit tests with no SDK benchmark dashboard.

What do I get? / Deliverables

Your agent can call the EvalScope benchmark remote MCP and return measurable verdicts and logged usage you can attach to ship checklists.

Agent-callable access to EvalScope benchmark dashboard outputs
Verdict and receipt JSON suitable for release notes or test folders
Usage logs tied to MCP tool invocations

Recommended MCP Servers

1ClickReport1clickreport/mcp

1ClickReport is a hosted Model Context Protocol server that packs forty tools for marketing analytics, campaign manageme…

222wcnm Bilistalkermcp222wcnm/BiliStalkerMCP

BiliStalker MCP is a Smithery-distributed Model Context Protocol server that lets coding agents stalk—not spam—Bilibili:…5 stars

ABMeter

ABMeter MCP (ai.abmeter/abmeter) connects Model Context Protocol clients to ABMeter’s feature-flag and A/B testing platf…

Abs Mcp

io.ausdata/abs-mcp is a stdio Model Context Protocol server that wraps Australian Bureau of Statistics data so Claude Co…

ACM 68000 Product Eligibility For AI Agentsallooloo/acm-68000

io.github.allooloo/acm-68000-mcp publishes the ACM-68000 DPU resolver as a Model Context Protocol service so AI agents c…1 stars

AdAdvisor MCP Server

AdAdvisor MCP Server (ai.adadvisor/mcp-server) exposes Meta (Facebook) Ads performance data to MCP-capable coding agents…

Journey fit

Primary fit

Benchmark and eval dashboards matter most in the ship phase when you are proving SDK or model choices hold up under measurement, not when you are still sketching product scope. Testing is the canonical shelf because EvalScope Benchmark MCP surfaces benchmark verdicts and eval-oriented JSON meant to gate releases and regressions.

How it compares

Remote benchmark-dashboard MCP, not a local eval skill or generic LLM chat plugin.

Common Questions / FAQ

Who is EvalScope Benchmark MCP for?

Developers shipping AI SDK integrations who already use EvalScope’s benchmark dashboard and want MCP access from their coding agent.

When should I use EvalScope Benchmark MCP?

Use it in ship/testing when you are validating SDK or model choices and need benchmark verdicts and receipts without leaving the IDE.

How do I add EvalScope Benchmark MCP to my agent?

Configure https://evalscopebench.clauxel.com/mcp as a remote MCP server and supply Authorization Bearer token from the EvalScope product site.

Evalscopebench Mcp

clauxel/evalscope-benchmark-mcp-mcp

Pull AI SDK benchmark dashboard results into your agent via MCP so you can compare runs, verdicts, and usage logs before you ship model or SDK changes.

Overview

EvalScope Benchmark MCP is an MCP server for the ship phase that exposes the AI SDK benchmark dashboard to agents with verdicts, receipts, usage logs, and audit-ready JSON.

What is this MCP server?

Paid remote MCP for the EvalScope AI SDK benchmark dashboard
Streamable HTTP remote at evalscopebench.clauxel.com/mcp with Bearer auth
Returns verdicts, receipts, usage logs, and audit-ready JSON from benchmark runs
Registry title EvalScope Benchmark MCP with server card URL on Clauxel
GitHub reference repo: clauxel/evalscope-benchmark-mcp-mcp
Version 1.0.0 in MCP registry schema
Remote URL: evalscopebench.clauxel.com/mcp (streamable-http)
Publisher tags: evalscopebench, ai-sdk-benchmark-dashboard, paid-mcp

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

What problem does it solve?

You cannot sanity-check AI SDK benchmark regressions from inside your coding agent without tab-switching or bespoke API glue.

Who is it for?

Solo builders running AI SDK benchmarks who want dashboard verdicts on demand inside Claude Code or Cursor during release testing.

Skip if: Teams without EvalScope access, offline-only eval workflows, or users who only need unit tests with no SDK benchmark dashboard.

What do I get? / Deliverables

Your agent can call the EvalScope benchmark remote MCP and return measurable verdicts and logged usage you can attach to ship checklists.

Agent-callable access to EvalScope benchmark dashboard outputs
Verdict and receipt JSON suitable for release notes or test folders
Usage logs tied to MCP tool invocations

Recommended MCP Servers

1ClickReport1clickreport/mcp

1ClickReport is a hosted Model Context Protocol server that packs forty tools for marketing analytics, campaign manageme…

222wcnm Bilistalkermcp222wcnm/BiliStalkerMCP

BiliStalker MCP is a Smithery-distributed Model Context Protocol server that lets coding agents stalk—not spam—Bilibili:…5 stars

ABMeter

ABMeter MCP (ai.abmeter/abmeter) connects Model Context Protocol clients to ABMeter’s feature-flag and A/B testing platf…

Abs Mcp

io.ausdata/abs-mcp is a stdio Model Context Protocol server that wraps Australian Bureau of Statistics data so Claude Co…

ACM 68000 Product Eligibility For AI Agentsallooloo/acm-68000

io.github.allooloo/acm-68000-mcp publishes the ACM-68000 DPU resolver as a Model Context Protocol service so AI agents c…1 stars

AdAdvisor MCP Server

AdAdvisor MCP Server (ai.adadvisor/mcp-server) exposes Meta (Facebook) Ads performance data to MCP-capable coding agents…

Journey fit

Primary fit

How it compares

Remote benchmark-dashboard MCP, not a local eval skill or generic LLM chat plugin.

Common Questions / FAQ

Who is EvalScope Benchmark MCP for?

Developers shipping AI SDK integrations who already use EvalScope’s benchmark dashboard and want MCP access from their coding agent.

When should I use EvalScope Benchmark MCP?

Use it in ship/testing when you are validating SDK or model choices and need benchmark verdicts and receipts without leaving the IDE.

How do I add EvalScope Benchmark MCP to my agent?

Configure https://evalscopebench.clauxel.com/mcp as a remote MCP server and supply Authorization Bearer token from the EvalScope product site.

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is EvalScope Benchmark MCP for?

When should I use EvalScope Benchmark MCP?

How do I add EvalScope Benchmark MCP to my agent?

This week for builders

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is EvalScope Benchmark MCP for?

When should I use EvalScope Benchmark MCP?

How do I add EvalScope Benchmark MCP to my agent?