
Metrillm
Compare local Ollama or LM Studio models for speed, answer quality, and whether your GPU or CPU can run them before you wire one into Claude Code or Cursor.
Overview
MetriLLM is an MCP server for the Validate phase that benchmarks local LLM models and reports speed, quality, and hardware fitness from your agent.
What is this MCP server?
- Benchmark local LLMs for latency and throughput from any MCP-capable agent
- Quality-oriented checks alongside raw speed for practical coding assistant use
- Hardware fitness verdict so you do not ship a product pinned to an un runnable model
- stdio npm package metrillm-mcp v0.2.6 for Claude Code, Cursor, and other MCP clients
- No cloud API required—runs against models you already host locally
- Package metrillm-mcp version 0.2.6 on npm with stdio transport
- Repository: github.com/MetriLLM/metrillm (mcp subfolder)
Community signal: 5 GitHub stars.
What problem does it solve?
You cannot tell which local model is fast enough and good enough on your laptop or desktop without repetitive manual timing and subjective chat tests.
Who is it for?
Indie builders running Ollama, llama.cpp, or similar who want agent-driven benchmarks before locking model and quantization choices.
Skip if: Teams that only use cloud APIs with no local inference, or anyone who needs production-grade load testing rather than developer-machine fitness checks.
What do I get? / Deliverables
After registering metrillm-mcp, your agent can run comparable benchmarks and surface a clear fitness verdict before you standardize on one local model.
- Structured benchmark results for speed and quality on your machine
- Hardware fitness verdict to guide default model choice
- Repeatable comparisons your agent can run without manual stopwatch tests
Recommended MCP Servers
Journey fit
Model choice is a validation decision solo builders make before committing stack and prompts to a specific local weights file. Prototyping with real benchmarks turns vague “this model feels slow” into a go/no-go signal on your actual hardware.
How it compares
MCP benchmarking integration, not a model hosting service or a prompt-tuning skill.
Common Questions / FAQ
Who is MetriLLM for?
Solo and indie developers who run local LLMs and use Claude Code, Cursor, or another MCP client to pick and validate models on their own hardware.
When should I use MetriLLM?
Use it when you are prototyping agent workflows, comparing quantizations, or re-validating performance after hardware or driver changes—before you commit your repo to one default model.
How do I add MetriLLM to my agent?
Install the npm package metrillm-mcp, add a stdio MCP server entry pointing at that binary in your Claude Code or Cursor MCP config, then invoke benchmark tools from the agent session.