Forgejudge

Name: Forgejudge
Author: ahmedEid1

ahmedEid1/forgejudge

Hook autonomous coding agents into an open eval leaderboard and CI gate that solves tasks, scores runs, and traces execution.

Overview

ForgeJudge is an MCP server for the ship phase that connects agents to an open eval leaderboard and CI gate for solving, scoring, and tracing autonomous coding runs.

What is this MCP server?

Open eval leaderboard for comparing autonomous coding agent runs
CI gate pattern: block merges or releases when agent evals fail thresholds
MCP server via stdio using uvx and PyPI package forgejudge[mcp]
Traces agent solve paths for post-hoc debugging of failures
Public site forgejudge.ahmedhobeishy.tech for leaderboard context
Registry server version 0.1.1; PyPI MCP package identifier version 0.1.0
Transport: stdio via uvx runtimeHint
Website: forgejudge.ahmedhobeishy.tech

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

What problem does it solve?

You cannot confidently ship an agent workflow when you lack standardized solve-score-trace benchmarks and an automated gate when regressions appear.

Who is it for?

Solo builders iterating on coding agents who want PyPI-based stdio MCP plus a public eval narrative before tagging releases.

Skip if: Pure application teams with no autonomous agent component, or anyone who only needs linting without agent-task benchmarks.

What do I get? / Deliverables

After install, your agent can drive ForgeJudge evals from MCP and you can enforce CI thresholds using comparable leaderboard scores and traces.

MCP-accessible solve, score, and trace workflows for coding agents
Comparable runs suitable for an open eval leaderboard
CI-oriented gating signals for agent regression control

Recommended MCP Servers

An MCP (Model Context Protocol) is a standardized interface that enables applications and AI agents to discover, connect…

0pidizzydes/botbox

io.github.dizzydes/0pi exposes a lightweight Model Context Protocol server around short-lived, free agent storage so sol…

100Hires AI ATS & Recruitment Software100Hires/mcp

The 100Hires MCP server is the official Model Context Protocol bridge to 100Hires, an AI-oriented applicant tracking and…

123elec Mcp

io.github.Servicedsi/123elec-mcp is the official Model Context Protocol interface for the 123elec electrical supplies me…

1staySTAYKER-COM/1Stay-mcp

1Stay by Stayker is a remote MCP server for hotel booking operations: search properties, complete bookings, and manage r…

3D MeshWeaver

io.github.Evozim/3d-meshweaver is a hosted Model Context Protocol server titled 3D-MeshWeaver that optimizes three-dimen…

Journey fit

Primary fit

Benchmarking and gating agent behavior belongs in ship when you prove the product—or the agent stack—meets quality bars before release. Solve-score-trace evals and CI gates are testing and verification mechanics, not initial build or distribution work.

How it compares

Agent eval and CI-gate MCP integration, not a generic unit-test runner skill or hosting marketplace.

Common Questions / FAQ

Who is ForgeJudge for?

Builders of autonomous coding agents who need scored benchmarks, traces, and CI-friendly gates exposed through MCP.

When should I use ForgeJudge?

Use it in ship and testing when you compare agent versions or block deploys until eval suites pass.

How do I add ForgeJudge to my agent?

Configure stdio MCP with uvx per the registry runtimeArguments (forgejudge[mcp] module forgejudge.mcp.server) in Claude Code, Cursor, or another MCP client.

Forgejudge

ahmedEid1/forgejudge

Hook autonomous coding agents into an open eval leaderboard and CI gate that solves tasks, scores runs, and traces execution.

Overview

ForgeJudge is an MCP server for the ship phase that connects agents to an open eval leaderboard and CI gate for solving, scoring, and tracing autonomous coding runs.

What is this MCP server?

Open eval leaderboard for comparing autonomous coding agent runs
CI gate pattern: block merges or releases when agent evals fail thresholds
MCP server via stdio using uvx and PyPI package forgejudge[mcp]
Traces agent solve paths for post-hoc debugging of failures
Public site forgejudge.ahmedhobeishy.tech for leaderboard context
Registry server version 0.1.1; PyPI MCP package identifier version 0.1.0
Transport: stdio via uvx runtimeHint
Website: forgejudge.ahmedhobeishy.tech

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

What problem does it solve?

You cannot confidently ship an agent workflow when you lack standardized solve-score-trace benchmarks and an automated gate when regressions appear.

Who is it for?

Solo builders iterating on coding agents who want PyPI-based stdio MCP plus a public eval narrative before tagging releases.

Skip if: Pure application teams with no autonomous agent component, or anyone who only needs linting without agent-task benchmarks.

What do I get? / Deliverables

After install, your agent can drive ForgeJudge evals from MCP and you can enforce CI thresholds using comparable leaderboard scores and traces.

MCP-accessible solve, score, and trace workflows for coding agents
Comparable runs suitable for an open eval leaderboard
CI-oriented gating signals for agent regression control

Recommended MCP Servers

An MCP (Model Context Protocol) is a standardized interface that enables applications and AI agents to discover, connect…

0pidizzydes/botbox

io.github.dizzydes/0pi exposes a lightweight Model Context Protocol server around short-lived, free agent storage so sol…

100Hires AI ATS & Recruitment Software100Hires/mcp

The 100Hires MCP server is the official Model Context Protocol bridge to 100Hires, an AI-oriented applicant tracking and…

123elec Mcp

io.github.Servicedsi/123elec-mcp is the official Model Context Protocol interface for the 123elec electrical supplies me…

1staySTAYKER-COM/1Stay-mcp

1Stay by Stayker is a remote MCP server for hotel booking operations: search properties, complete bookings, and manage r…

3D MeshWeaver

io.github.Evozim/3d-meshweaver is a hosted Model Context Protocol server titled 3D-MeshWeaver that optimizes three-dimen…

Journey fit

Primary fit

How it compares

Agent eval and CI-gate MCP integration, not a generic unit-test runner skill or hosting marketplace.

Common Questions / FAQ

Who is ForgeJudge for?

Builders of autonomous coding agents who need scored benchmarks, traces, and CI-friendly gates exposed through MCP.

When should I use ForgeJudge?

Use it in ship and testing when you compare agent versions or block deploys until eval suites pass.

How do I add ForgeJudge to my agent?

Configure stdio MCP with uvx per the registry runtimeArguments (forgejudge[mcp] module forgejudge.mcp.server) in Claude Code, Cursor, or another MCP client.

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is ForgeJudge for?

When should I use ForgeJudge?

How do I add ForgeJudge to my agent?

This week for builders

Overview

What is this MCP server?

What problem does it solve?

Who is it for?

What do I get? / Deliverables

Recommended MCP Servers

Journey fit

Who is ForgeJudge for?

When should I use ForgeJudge?

How do I add ForgeJudge to my agent?