
Can See
Give your agent eyes on TUI and CLI programs by capturing PNG screenshots of terminal sessions for debugging and guided interaction.
Overview
can-see is an MCP server for the Build phase that captures PNG screenshots of terminal and CLI apps so AI agents can see and interact with TUIs.
What is this MCP server?
- PNG screenshots of running terminal/CLI applications for agents
- Configurable DEFAULT_COLS (default 120) and DEFAULT_ROWS (default 30)
- IDLE_TIMEOUT_MS auto-closes idle sessions (default 300000 ms)
- npm can-see v0.4.0 with stdio MCP transport
- Bridges blind text-only agent limits for interactive TUIs
- Server version 0.4.0
- Default terminal: 120 columns × 30 rows
- Default idle timeout: 300000 ms
What problem does it solve?
Agents debugging terminal UIs only get partial text streams and cannot see layout, prompts, or visual state on the screen.
Who is it for?
Indie devs building or hardening CLI and TUI tools who want vision-backed agent loops in the terminal.
Skip if: Pure web frontend work, headless batch jobs with no interactive UI, or teams unwilling to run local terminal capture processes.
What do I get? / Deliverables
Your agent receives PNG snapshots of CLI sessions so it can guide fixes and interactions on interactive terminal apps.
- PNG visual captures of terminal sessions for the agent
- Configurable terminal dimensions and idle session cleanup
- Closer feedback loops when fixing interactive shell UX
Recommended MCP Servers
Journey fit
Terminal-heavy builds—CLIs, installers, and curses UIs—need visual feedback loops while you iterate locally with an agent partner. Agent-tooling fits because the server exists to extend what the model can perceive and steer in shell-driven workflows, not to ship end-user features.
How it compares
Terminal vision MCP for CLIs, not a browser automation or Playwright-style web skill.
Common Questions / FAQ
Who is can-see for?
Solo builders using AI agents to develop, test, or repair interactive command-line and terminal user interfaces.
When should I use can-see?
Use it when text-only tool output is insufficient and the agent needs screenshots to understand TUI layout, menus, or errors.
How do I add can-see to my agent?
Install npm package can-see, register it as a stdio MCP server, optionally set DEFAULT_COLS, DEFAULT_ROWS, and IDLE_TIMEOUT_MS, then run terminal apps through the server tools.