
Mcp Florence2
Expose Florence-2 vision capabilities to your agent as MCP tools for captioning, detection, and other image understanding tasks.
Overview
io.github.jkawamoto/mcp-florence2 is a MCP server for the Build phase that processes images using Florence-2 for agent-accessible vision tools.
What is this MCP server?
- MCP server v0.3.9 wrapping Microsoft Florence-2 for image processing
- Distributed as mcpb stdio bundle from jkawamoto/mcp-florence2 GitHub releases
- Lets coding agents run vision tasks without embedding full model pipelines in app code first
- Suited to prototypes needing captions, OCR-style understanding, or visual Q&A hooks
- Server version 0.3.9
- Transport type stdio via mcpb package
- Repository github.com/jkawamoto/mcp-florence2
Community signal: 7 GitHub stars.
What problem does it solve?
Adding vision to an agent-built product usually means wrestling with model weights and inference code before you can test a single user flow.
Who is it for?
Indie builders prototyping image captioning, UI understanding, or visual search with Florence-2 behind Claude Code or Cursor.
Skip if: Text-only CRUD apps, teams needing managed cloud vision APIs with SLAs and no local model setup, or production scale without reviewing repo runtime requirements.
What do I get? / Deliverables
Registering mcp-florence2 exposes Florence-2 image processing to your agent so you can iterate on multimodal features through MCP tool calls.
- Florence-2-backed image processing tools available to your agent
- Faster multimodal feature spikes without bespoke inference wrappers
- Documented mcpb-based install pinned to release v0.3.9
Recommended MCP Servers
Journey fit
Vision MCP servers are added while building multimodal features and agent tooling, not during initial idea research alone. Florence-2 processing is an ML integration boundary—agents call it as a tool during backend and agent-feature implementation.
How it compares
Local Florence-2 vision MCP—not a generic image CDN skill or browser screenshot automation server.
Common Questions / FAQ
Who is io.github.jkawamoto/mcp-florence2 for?
Developers building multimodal or vision-assisted features who want agents to call Florence-2 through MCP instead of custom scripts.
When should I use io.github.jkawamoto/mcp-florence2?
During Build integrations when you need image understanding in the agent loop for prototypes, internal tools, or feature spikes.
How do I add io.github.jkawamoto/mcp-florence2 to my agent?
Install the v0.3.9 mcpb from jkawamoto/mcp-florence2 releases, configure stdio in your MCP client, satisfy Florence-2 runtime prerequisites from the repo, then verify vision tools in a test session.