
SelfHeal MCP
Wrap flaky downstream MCP servers with retries, circuit breaking, and observability so agent workflows stay up in production.
Overview
SelfHeal MCP is a MCP server for the Operate phase that proxies other MCP servers with automatic retry, circuit breaking, and observability.
What is this MCP server?
- Self-healing proxy layer in front of other MCP servers
- Configurable retries via SELFHEAL_MAX_RETRIES (default 3)
- Circuit breaker with SELFHEAL_CIRCUIT_THRESHOLD (default 5 failures)
- Observability hooks for diagnosing unstable tool chains
- npm package selfheal-mcp v0.1.1 with optional selfheal.config.json path
- Server version 0.1.1; npm identifier selfheal-mcp
- Default SELFHEAL_MAX_RETRIES 3 and SELFHEAL_CIRCUIT_THRESHOLD 5
- 3 documented environment variables including optional config path
What problem does it solve?
One unreliable MCP tool can break every agent action chain with opaque timeouts and no backoff, wasting tokens and blocking shipping work.
Who is it for?
Indie builders running several MCP integrations in Claude Code or Cursor who need production-grade guardrails without building their own proxy.
Skip if: Greenfield projects with only one stable MCP server and no observed flakiness yet.
What do I get? / Deliverables
After wiring SelfHeal in front of fragile MCP servers, transient failures retry within limits and repeated outages trip a circuit so your agent gets clear, fast feedback.
- Hardened MCP call path with retries
- Circuit-open protection after repeated failures
- Operational visibility into MCP failure patterns
Recommended MCP Servers
Journey fit
Operate is the right shelf because this proxy solves reliability once MCP integrations are already wired and failing in real use. Infra subphase matches proxy configuration, failure thresholds, and resilience patterns around MCP server fleets.
How it compares
MCP reliability proxy, not a broken-link scanner or generic HTTP fetch layer.
Common Questions / FAQ
Who is selfheal-mcp for?
It is for developers operating agent setups that depend on multiple MCP servers and need retries and circuit breakers without custom middleware.
When should I use selfheal-mcp?
Use it when MCP calls intermittently fail, hang, or cascade errors after you have already validated the underlying tools work in isolation.
How do I add selfheal-mcp to my agent?
Install the npm package selfheal-mcp, point SELFHEAL_CONFIG at selfheal.config.json if needed, set retry and circuit env vars, and register the stdio server ahead of downstream MCP endpoints in your client config.