
Vision Analysis
Turn screenshots, mockups, charts, and photos into structured descriptions or OCR text via the MiniMax vision MCP tool during design and research.
Install
npx skills add https://github.com/minimax-ai/skills --skill vision-analysisWhat is this skill?
- Routes image paths or URLs to MiniMax `MiniMax_understand_image` MCP for vision understanding
- Triggers on common image extensions (.jpg, .png, .webp, .svg, etc.) plus analyze/describe/OCR phrasing
- Supports mockup review, wireframe critique, chart data extraction, and object or activity identification
- Documents setup when MCP is missing: MiniMax Token Plan and `MINIMAX_API_KEY` configuration
- Metadata version 1.0, category ai-vision, MIT license
Adoption & trust: 1.6k installs on skills.sh; 12.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
Recommended Skills
Microsoft Foundrymicrosoft/azure-skills
Azure Aimicrosoft/azure-skills
Azure Hosted Copilot Sdkmicrosoft/azure-skills
Lark Eventlarksuite/cli
Running Claude Code Via Litellm Copilotxixu-me/skills
Setup Matt Pocock Skillsmattpocock/skills
Journey fit
Primary fit
Canonical shelf is Build → frontend because UI mockup, wireframe, and screenshot review is the most common solo-builder trigger in SKILL.md. Frontend subphase fits design critique, wireframe analysis, and visual spec extraction before implementation.
Common Questions / FAQ
Is Vision Analysis safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Vision Analysis
# Vision Analysis Analyze images using the MiniMax `MiniMax_understand_image` MCP tool available in the MiniMax Token Plan. ## Prerequisites - MiniMax Token Plan subscription with valid `MINIMAX_API_KEY` - MiniMax MCP configured (`MiniMax_understand_image` tool available) ### If MCP tool is not configured **Step 1:** The agent should fetch setup instructions from: **https://platform.minimaxi.com/docs/token-plan/mcp-guide** **Step 2:** Detect the user's environment (OpenCode, Cursor, Claude Code, etc.) and output the exact commands needed. Common examples: **OpenCode** — add to `~/.config/opencode/opencode.json` or `package.json`: ```json { "mcp": { "MiniMax": { "type": "local", "command": ["uvx", "minimax-coding-plan-mcp", "-y"], "environment": { "MINIMAX_API_KEY": "YOUR_TOKEN_PLAN_KEY", "MINIMAX_API_HOST": "https://api.minimaxi.com" }, "enabled": true } } } ``` **Claude Code**: ```bash claude mcp add -s user MiniMax --env MINIMAX_API_KEY=your-key --env MINIMAX_API_HOST=https://api.minimaxi.com -- uvx minimax-coding-plan-mcp -y ``` **Cursor** — add to MCP settings: ```json { "mcpServers": { "MiniMax": { "command": "uvx", "args": ["minimax-coding-plan-mcp"], "env": { "MINIMAX_API_KEY": "your-key", "MINIMAX_API_HOST": "https://api.minimaxi.com" } } } } ``` **Step 3:** After configuration, tell the user to restart their app and verify with `/mcp`. **Important:** If the user does not have a MiniMax Token Plan subscription, inform them that the `understand_image` tool requires one — it cannot be used with free or other tier API keys. ## Analysis Modes | Mode | When to use | Prompt strategy | |---|---|---| | `describe` | General image understanding | Ask for detailed description | | `ocr` | Text extraction from screenshots, documents | Ask to extract all text verbatim | | `ui-review` | UI mockups, wireframes, design files | Ask for design critique with suggestions | | `chart-data` | Charts, graphs, data visualizations | Ask to extract data points and trends | | `object-detect` | Identify objects, people, activities | Ask to list and locate all elements | ## Workflow ### Step 1: Auto-detect image The skill triggers automatically when a message contains an image file path or URL with extensions: `.jpg`, `.jpeg`, `.png`, `.gif`, `.webp`, `.bmp`, `.svg` Extract the image path from the message. ### Step 2: Select analysis mode and call MCP tool Use the `MiniMax_understand_image` tool with a mode-specific prompt: **describe:** ``` Provide a detailed description of this image. Include: main subject, setting/background, colors/style, any text visible, notable objects, and overall composition. ``` **ocr:** ``` Extract all text visible in this image verbatim. Preserve structu