Minimax Image Understanding

Name: Minimax Image Understanding
Author: imsus

imsus/pi-extension-minimax-coding-plan-mcp

Call MiniMax VLM understand_image on screenshots, diagrams, and UI photos when text context is missing or you need OCR-style extraction.

Overview

MiniMax Image Understanding is an agent skill most often used in Build (also Ship) that analyzes images via the MiniMax understand_image VLM tool.

Install

npx skills add https://github.com/imsus/pi-extension-minimax-coding-plan-mcp --skill minimax-image-understanding

What is this skill?

Single understand_image tool: prompt plus image_url (HTTPS or data:image base64)
POST {api_host}/v1/coding_plan/vlm with JSON prompt and image_url
Documented use cases: error screenshots, UI/UX review, charts, OCR, visual debugging
Explicit anti-patterns: skip when image already described, simple icons, inaccessible URLs, or redundant file context
Single POST endpoint path v1/coding_plan/vlm documented in SKILL.md

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 520 installs on skills.sh; 13 GitHub stars; 1/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your agent cannot see the screenshot, diagram, or UI bug you are staring at, so debugging and design feedback stall.

Who is it for?

Developers using the pi-extension MiniMax MCP who routinely paste screenshots and want consistent tool invocation rules.

Skip if: Sessions where the image is already fully described, the asset is a trivial icon, or no valid image_url is available.

When should I use this skill?

You need to analyze, describe, or extract information from an image using the understand_image tool.

What do I get? / Deliverables

You get a text analysis from MiniMax VLM for the provided image URL or base64 payload, usable for fixes, OCR, or UX notes.

VLM text content field summarizing or answering about the image

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildIntegrations & version control

Canonical shelf is Build › integrations because the skill is a thin wrapper around the MiniMax coding_plan VLM HTTP tool inside an MCP extension stack. Integrations subphase is where agent skills bind to hosted multimodal APIs rather than local-only logic.

Also useful

ShipCode review

Where it fits

Example use

BuildUI/UX & frontend

Upload a layout screenshot and ask which component broke responsive spacing.

Example use

BuildIntegrations & version control

Parse a diagram of a third-party webhook flow before wiring handlers.

Example use

ShipCode review

Interpret a production error dialog image during pre-release QA.

How it compares

Agent skill guardrails around a hosted VLM call—not a replacement for pasting long prose descriptions when context is already complete.

Common Questions / FAQ

Who is minimax-image-understanding for?

Solo builders running Claude Code, Cursor, or similar agents with the MiniMax coding-plan MCP who need vision on demand.

When should I use minimax-image-understanding?

During Build for UI and integration debugging from screenshots, and during Ship review when visual regressions or error dialogs need interpretation.

Is minimax-image-understanding safe to install?

Review the Security Audits panel on this Prism page; the skill sends image data to MiniMax APIs so confirm keys, data policy, and MCP extension source.

SKILL.md

READMESKILL.md - Minimax Image Understanding

# MiniMax Image Understanding Skill

Use this skill when you need to analyze, describe, or extract information from images.

## How to Use

Call the `understand_image` tool directly with a prompt and image URL:

```
understand_image({
  prompt: "Your question about the image",
  image_url: "https://example.com/image.png"
})
```

## When to Use

Use `understand_image` when:

- **Screenshots**: Error messages, UI issues, code in screenshots
- **Visual content**: Photos, diagrams, charts, graphs
- **Documents**: Extracting text from images (OCR), understanding layouts
- **UI/UX analysis**: Evaluating designs, identifying components
- **Visual debugging**: Understanding visual bugs or layout issues

## When NOT to Use

Do NOT use `understand_image` when:

- **Image is already described** in the conversation
- **The image is a simple icon** or emoji you recognize
- **No image is provided** or the image URL is inaccessible
- **Redundant with existing context** (e.g., file contents already visible)

## Usage

```
understand_image({
  prompt: "What do you see in this image?",
  image_url: "https://example.com/screenshot.png"
})
```

## API Details

**Endpoint**: `POST {api_host}/v1/coding_plan/vlm`

**Request Body**:
```json
{
  "prompt": "Your question about the image",
  "image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}
```

**Response Format**:
```json
{
  "content": "AI analysis of the image...",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}
```

## Image Processing

The tool automatically handles three types of image inputs:

1. **HTTP/HTTPS URLs**: Downloads the image and converts to base64
   - Example: `https://example.com/image.jpg`

2. **Local file paths**: Reads local files and converts to base64
   - Absolute: `/Users/username/Documents/image.png`
   - Relative: `images/photo.png`
   - Removes `@` prefix if present

3. **Base64 data URLs**: Passes through existing base64 data
   - Example: `data:image/png;base64,iVBORw0KGgo...`

## Image Formats

Supported:
- **JPEG** (.jpg, .jpeg)
- **PNG** (.png)
- **WebP** (.webp)

Not supported:
- PDF, GIF, PSD, SVG, and other formats

## Crafting Effective Prompts

### For Descriptions
- "Describe what's in this image in detail"
- "What is the main subject of this image?"
- "Describe the visual style and composition"

### For Code/Technical
- "What code is shown in this screenshot?"
- "Extract all text from this image"
- "Identify the UI framework/components used"

### For Analysis
- "Analyze this UI design. What is working well and what could be improved?"
- "What emotions or mood does this image convey?"
- "Compare this design to Material Design principles"

### For OCR/Text Extraction
- "Extract all text from this image"
- "Read the error message in this screenshot"
- "What does the label say in this image?"

## Examples

### Error Analysis
```
understand_image({
  prompt: "What is the error message and where is it located in this screenshot?",
  image_url: "./error-screenshot.png"
})
```

### Code Screenshot
```
understand_image({
  prompt: "What code is shown in this screenshot? Please transcribe it exactly.",
  image_url: "https://example.com/code.png"
})
```

### Design Review
```
understand_image({
  prompt: "Analyze this UI design. What is working well and what could be improved?",
  image_url: "https://example.com/mockup.png"
})
```

### OCR
```
understand_image({
  prompt: "Extract all text from this image",
  image_url: "/Users/username/Documents/scan.png"
})
```

## Tips

1. **Be specific** in your prompt about what you want to know
2. **Mention format** if you need structured output (e.g., "list all elements")
3. **Include context** if the image is part of a larger task
4. **For screenshots**, specify if you need full-page or just a specific area
5. **Complex analysis** may trigger a confirmation prompt (analyze, extract, describe, r

What is this skill?

Single understand_image tool: prompt plus image_url (HTTPS or data:image base64)

POST {api_host}/v1/coding_plan/vlm with JSON prompt and image_url

Documented use cases: error screenshots, UI/UX review, charts, OCR, visual debugging

Explicit anti-patterns: skip when image already described, simple icons, inaccessible URLs, or redundant file context

Single POST endpoint path v1/coding_plan/vlm documented in SKILL.md

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 520 installs on skills.sh; 13 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildIntegrations & version control

Also useful

ShipCode review

Where it fits

Example use

BuildUI/UX & frontend

Upload a layout screenshot and ask which component broke responsive spacing.

Example use

BuildIntegrations & version control

Parse a diagram of a third-party webhook flow before wiring handlers.

Example use

ShipCode review

Interpret a production error dialog image during pre-release QA.

SKILL.md

READMESKILL.md - Minimax Image Understanding

# MiniMax Image Understanding Skill

Use this skill when you need to analyze, describe, or extract information from images.

## How to Use

Call the `understand_image` tool directly with a prompt and image URL:

```
understand_image({
  prompt: "Your question about the image",
  image_url: "https://example.com/image.png"
})
```

## When to Use

Use `understand_image` when:

- **Screenshots**: Error messages, UI issues, code in screenshots
- **Visual content**: Photos, diagrams, charts, graphs
- **Documents**: Extracting text from images (OCR), understanding layouts
- **UI/UX analysis**: Evaluating designs, identifying components
- **Visual debugging**: Understanding visual bugs or layout issues

## When NOT to Use

Do NOT use `understand_image` when:

- **Image is already described** in the conversation
- **The image is a simple icon** or emoji you recognize
- **No image is provided** or the image URL is inaccessible
- **Redundant with existing context** (e.g., file contents already visible)

## Usage

```
understand_image({
  prompt: "What do you see in this image?",
  image_url: "https://example.com/screenshot.png"
})
```

## API Details

**Endpoint**: `POST {api_host}/v1/coding_plan/vlm`

**Request Body**:
```json
{
  "prompt": "Your question about the image",
  "image_url": "data:image/jpeg;base64,/9j/4AAQ..."
}
```

**Response Format**:
```json
{
  "content": "AI analysis of the image...",
  "base_resp": {
    "status_code": 0,
    "status_msg": "success"
  }
}
```

## Image Processing

The tool automatically handles three types of image inputs:

1. **HTTP/HTTPS URLs**: Downloads the image and converts to base64
   - Example: `https://example.com/image.jpg`

2. **Local file paths**: Reads local files and converts to base64
   - Absolute: `/Users/username/Documents/image.png`
   - Relative: `images/photo.png`
   - Removes `@` prefix if present

3. **Base64 data URLs**: Passes through existing base64 data
   - Example: `data:image/png;base64,iVBORw0KGgo...`

## Image Formats

Supported:
- **JPEG** (.jpg, .jpeg)
- **PNG** (.png)
- **WebP** (.webp)

Not supported:
- PDF, GIF, PSD, SVG, and other formats

## Crafting Effective Prompts

### For Descriptions
- "Describe what's in this image in detail"
- "What is the main subject of this image?"
- "Describe the visual style and composition"

### For Code/Technical
- "What code is shown in this screenshot?"
- "Extract all text from this image"
- "Identify the UI framework/components used"

### For Analysis
- "Analyze this UI design. What is working well and what could be improved?"
- "What emotions or mood does this image convey?"
- "Compare this design to Material Design principles"

### For OCR/Text Extraction
- "Extract all text from this image"
- "Read the error message in this screenshot"
- "What does the label say in this image?"

## Examples

### Error Analysis
```
understand_image({
  prompt: "What is the error message and where is it located in this screenshot?",
  image_url: "./error-screenshot.png"
})
```

### Code Screenshot
```
understand_image({
  prompt: "What code is shown in this screenshot? Please transcribe it exactly.",
  image_url: "https://example.com/code.png"
})
```

### Design Review
```
understand_image({
  prompt: "Analyze this UI design. What is working well and what could be improved?",
  image_url: "https://example.com/mockup.png"
})
```

### OCR
```
understand_image({
  prompt: "Extract all text from this image",
  image_url: "/Users/username/Documents/scan.png"
})
```

## Tips

1. **Be specific** in your prompt about what you want to know
2. **Mention format** if you need structured output (e.g., "list all elements")
3. **Include context** if the image is part of a larger task
4. **For screenshots**, specify if you need full-page or just a specific area
5. **Complex analysis** may trigger a confirmation prompt (analyze, extract, describe, r

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is minimax-image-understanding for?

When should I use minimax-image-understanding?

Is minimax-image-understanding safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is minimax-image-understanding for?

When should I use minimax-image-understanding?

Is minimax-image-understanding safe to install?

SKILL.md