Baoyu Image Gen

Name: Baoyu Image Gen
Author: jimliu

jimliu/baoyu-skills

30.4k installs
24.2k repo stars
Updated July 4, 2026
jimliu/baoyu-skills

baoyu-image-gen is a skill that generates images via OpenAI, Google, Azure, and other APIs with batch processing.

About

AI image generation skill supporting OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, and Replicate APIs. Handles text-to-image, reference images, aspect ratios, and batch generation from saved prompt files.

Supports 9+ image generation APIs (OpenAI, Google, Azure, OpenRouter, DashScope, etc.)
Batch generation mode with parallel worker control
Marked deprecated - use baoyu-imagine instead

Baoyu Image Gen by the numbers

30,446 all-time installs (skills.sh)
+491 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #65 of 1,340 Generative Media skills by installs in the Skillselion catalog
Security screen: HIGH risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

baoyu-image-gen capabilities & compatibility

Works with: openai · gcp · azure
Pricing: Bring your own API key

From the docs

What baoyu-image-gen says it does

[Deprecated: use baoyu-imagine] AI image generation with OpenAI, Azure OpenAI, Google, OpenRouter, DashScope

SKILL.md

Sequential by default; use batch parallel generation when the user already has multiple prompts

SKILL.md

Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files

SKILL.md

npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-image-gen

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/jimliu/baoyu-skills/baoyu-image-gen.svg)](https://skillselion.com/skills/jimliu/baoyu-skills/baoyu-image-gen)

Installs	30.4k
repo stars	★ 24.2k
Security audit	3 / 3 scanners passed
Last updated	July 4, 2026
Repository	jimliu/baoyu-skills ↗

How do you generate images from a coding agent?

AI image generation skill supporting OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, and Replicate APIs. Handles text-to-image, reference images, aspect ratios,

Who is it for?

Developers generating app icons, mockups, or marketing images from agent sessions using multiple commercial image API providers.

Skip if: Video generation, production design systems requiring baoyu-imagine successor, or environments without bun or npx available.

When should I use this skill?

User asks to generate, create, draw, or batch-produce images with AI providers from the coding agent.

What you get

Generated image files from text prompts, reference images, or batch prompt files with chosen aspect ratios

Generated image files
Batch-generated image sets

By the numbers

baoyu-image-gen version 1.56.4 from jimliu/baoyu-skills
Integrates 9 named commercial image API providers

Files

references/
- config/
- providers/
scripts/
- providers/

SKILL.mdMarkdownGitHub ↗

Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

1. Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent. 2. Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question. 3. Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

Script Directory

{baseDir} = this SKILL.md's directory. Main script: {baseDir}/scripts/main.ts. Resolve ${BUN_X}: prefer bun; else npx -y bun; else suggest brew install oven-sh/bun/bun.

Step 0: Load Preferences ⛔ BLOCKING

This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.

Check these paths in order; first hit wins:

Path	Scope
`.baoyu-skills/baoyu-image-gen/EXTEND.md`	Project
`${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md`	XDG
`$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md`	User home

Found → load, parse, apply. If default_model.[provider] is null → ask model only.
Not found → run first-time setup (references/config/first-time-setup.md) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.

EXTEND.md keys: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: references/config/preferences-schema.md.

Usage

Minimum working examples — see references/usage-examples.md for the full set including per-provider invocations and batch mode.

# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio and high quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k

# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference image
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro

# Batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4

Options

Option	Description
`--prompt <text>`, `-p`	Prompt text
`--promptfiles <files...>`	Read prompt from files (concatenated)
`--image <path>`	Output image path (required in single-image mode)
`--batchfile <path>`	JSON batch file for multi-image generation
`--jobs <count>`	Worker count for batch mode (default: auto, max from config, built-in default 10)
`--provider google\	openai\
`--model <id>`, `-m`	Model ID — see provider references for defaults and allowed values
`--ar <ratio>`	Aspect ratio (`16:9`, `1:1`, `4:3`, …)
`--size <WxH>`	Explicit size (e.g., `1024x1024`)
`--quality normal\	2k`
`--imageSize 1K\	2K\
`--imageApiDialect openai-native\	ratio-metadata`
`--ref <files...>`	Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0
`--n <count>`	Number of images. Replicate requires `--n 1` (single-output save semantics)
`--json`	JSON output

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`AZURE_OPENAI_API_KEY`	Azure OpenAI API key
`OPENROUTER_API_KEY`	OpenRouter API key
`GOOGLE_API_KEY`	Google API key
`DASHSCOPE_API_KEY`	DashScope API key
`ZAI_API_KEY` (alias `BIGMODEL_API_KEY`)	Z.AI API key
`MINIMAX_API_KEY`	MiniMax API key
`REPLICATE_API_TOKEN`	Replicate API token
`JIMENG_ACCESS_KEY_ID`, `JIMENG_SECRET_ACCESS_KEY`	Jimeng (即梦) Volcengine credentials
`ARK_API_KEY`	Seedream (豆包) Volcengine ARK API key
`<PROVIDER>_IMAGE_MODEL`	Per-provider model override (`OPENAI_IMAGE_MODEL`, `GOOGLE_IMAGE_MODEL`, `DASHSCOPE_IMAGE_MODEL`, `ZAI_IMAGE_MODEL`/`BIGMODEL_IMAGE_MODEL`, `MINIMAX_IMAGE_MODEL`, `OPENROUTER_IMAGE_MODEL`, `REPLICATE_IMAGE_MODEL`, `JIMENG_IMAGE_MODEL`, `SEEDREAM_IMAGE_MODEL`)
`AZURE_OPENAI_DEPLOYMENT` (alias `AZURE_OPENAI_IMAGE_MODEL`)	Azure default deployment
`<PROVIDER>_BASE_URL`	Per-provider endpoint override
`AZURE_API_VERSION`	Azure image API version (default `2025-04-01-preview`)
`JIMENG_REGION`	Jimeng region (default `cn-north-1`)
`OPENAI_IMAGE_API_DIALECT`	`openai-native` \
`OPENROUTER_HTTP_REFERER`, `OPENROUTER_TITLE`	Optional OpenRouter attribution
`BAOYU_IMAGE_GEN_MAX_WORKERS`	Override batch worker cap
`BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY`	Per-provider concurrency (e.g., `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`)
`BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS`	Per-provider start-gap

Load priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env

Model Resolution

Priority (highest → lowest) applies to every provider:

1. CLI flag --model <id> 2. EXTEND.md default_model.[provider] 3. Env var <PROVIDER>_IMAGE_MODEL 4. Built-in default

For Azure, --model / default_model.azure is the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var; AZURE_OPENAI_IMAGE_MODEL is kept as a backward-compatible alias.

EXTEND.md overrides env vars: if EXTEND.md sets default_model.google: "gemini-3-pro-image-preview" and the env var sets GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview, EXTEND.md wins.

Display model info before each generation:

Using [provider] / [model]
Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

OpenAI-Compatible Gateway Dialects

provider=openai means the auth and routing entrypoint is OpenAI-compatible. It does not guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set default_image_api_dialect in EXTEND.md, OPENAI_IMAGE_API_DIALECT, or --imageApiDialect:

openai-native: pixel size (1536x1024) and native OpenAI quality fields
ratio-metadata: aspect-ratio size (16:9) plus metadata.resolution (1K|2K|4K) and metadata.orientation

Use openai-native for the OpenAI native API or strict clones; try ratio-metadata for compatibility gateways in front of Gemini or similar models. Current limitation: ratio-metadata applies only to text-to-image; reference-image edits still need openai-native or a provider with first-class edit support.

Provider-Specific Guides

Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:

Provider	Reference
DashScope (Qwen-Image families, custom sizes)	`references/providers/dashscope.md`
Z.AI (GLM-Image, cogview-4)	`references/providers/zai.md`
MiniMax (image-01, subject-reference)	`references/providers/minimax.md`
OpenRouter (multimodal models, `/chat/completions` flow)	`references/providers/openrouter.md`
Replicate (nano-banana, Seedream, Wan)	`references/providers/replicate.md`

Provider Selection

1. --ref provided + no --provider → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax (MiniMax's subject reference is more specialized toward character/portrait consistency) 2. --provider specified → use it (if --ref, must be google/openai/azure/openrouter/replicate/seedream/minimax) 3. Only one API key present → use that provider 4. Multiple keys → default priority: Google → OpenAI → Azure → OpenRouter → DashScope → Z.AI → MiniMax → Replicate → Jimeng → Seedream

Quality Presets

Preset	Google imageSize	OpenAI size	OpenRouter size	Replicate resolution	Use case
`normal`	1K	1024px	1K	1K	Quick previews
`2k` (default)	2K	2048px	2K	2K	Covers, illustrations, infographics

Google/OpenRouter imageSize can be overridden with --imageSize 1K|2K|4K.

Aspect Ratios

Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1.

Google multimodal: imageConfig.aspectRatio
OpenAI: closest supported size
OpenRouter: imageGenerationOptions.aspect_ratio; if only --size <WxH> is given, the ratio is inferred
Replicate: behavior is model-specific — google/nano-banana* uses aspect_ratio, bytedance/seedream-* uses documented Replicate ratios, Wan 2.7 maps --ar to a concrete size
MiniMax: official aspect_ratio values; if --size <WxH> is given without --ar, sends width/height for image-01

Generation Mode

Default: sequential. Batch parallel: enabled automatically when --batchfile contains 2+ pending tasks.

Situation	Prefer	Why
One image, or 1-2 simple images	Sequential	Lower coordination overhead, easier debugging
Multiple images with saved prompt files	Batch (`--batchfile`)	Reuses finalized prompts, applies shared throttling/retries, predictable throughput
Each image still needs its own reasoning / prompt writing / style exploration	Subagents	Work is still exploratory, each needs independent analysis
Input is `outline.md` + `prompts/` (e.g. from `baoyu-article-illustrator`)	Batch — use `scripts/build-batch.ts` to assemble the payload	The outline + prompt files already contain everything needed

Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.

Parallel behavior:

Default worker count is automatic, capped by config, built-in default 10
Provider-specific throttling applies only in batch mode; defaults are tuned for throughput while avoiding RPM bursts
Override with --jobs <count>
Each image retries up to 3 attempts
Final output includes success count, failure count, and per-image failure reasons

Error Handling

Missing API key → error with setup instructions
Generation failure → auto-retry up to 3 attempts per image
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint

References

File	Content
`references/usage-examples.md`	Extended CLI examples across providers and batch mode
`references/providers/dashscope.md`	DashScope families, sizes, limits
`references/providers/zai.md`	Z.AI GLM-image / cogview-4
`references/providers/minimax.md`	MiniMax image-01 + subject reference
`references/providers/openrouter.md`	OpenRouter multimodal flow
`references/providers/replicate.md`	Replicate supported families + guardrails
`references/config/preferences-schema.md`	EXTEND.md schema
`references/config/first-time-setup.md`	First-time setup flow

Extension Support

Custom configurations via EXTEND.md. See Step 0 for paths and schema.

First-Time Setup

Overview

Triggered when: 1. No EXTEND.md found → full setup (provider + model + preferences) 2. EXTEND.md found but default_model.[provider] is null → model selection only

Setup Flow

No EXTEND.md found          EXTEND.md found, model null
        │                            │
        ▼                            ▼
┌─────────────────────┐    ┌──────────────────────┐
│ AskUserQuestion     │    │ AskUserQuestion      │
│ (full setup)        │    │ (model only)         │
└─────────────────────┘    └──────────────────────┘
        │                            │
        ▼                            ▼
┌─────────────────────┐    ┌──────────────────────┐
│ Create EXTEND.md    │    │ Update EXTEND.md     │
└─────────────────────┘    └──────────────────────┘
        │                            │
        ▼                            ▼
    Continue                     Continue

Flow 1: No EXTEND.md (Full Setup)

Language: Use user's input language or saved language preference.

Use AskUserQuestion with ALL questions in ONE call:

Question 1: Default Provider

header: "Provider"
question: "Default image generation provider?"
options:
  - label: "Google (Recommended)"
    description: "Gemini multimodal - high quality, reference images, flexible sizes"
  - label: "OpenAI"
    description: "GPT Image - consistent quality, reliable output"
  - label: "Azure OpenAI"
    description: "Azure-hosted GPT Image deployments with resource-specific routing"
  - label: "OpenRouter"
    description: "Router for Gemini/FLUX/OpenAI-compatible image models"
  - label: "DashScope"
    description: "Alibaba Cloud - Qwen-Image, strong Chinese/English text rendering"
  - label: "MiniMax"
    description: "MiniMax image generation with subject-reference character workflows"
  - label: "Replicate"
    description: "Community models - nano-banana-pro, flexible model selection"
  - label: "Z.AI"
    description: "GLM-Image - text-to-image with recommended aspect sizes"

Question 2: Default Google Model

Only show if user selected Google or auto-detect (no explicit provider).

header: "Google Model"
question: "Default Google image generation model?"
options:
  - label: "gemini-3-pro-image-preview (Recommended)"
    description: "Highest quality, best for production use"
  - label: "gemini-3.1-flash-image-preview"
    description: "Fast generation, good quality, lower cost"
  - label: "gemini-3-flash-preview"
    description: "Fast generation, balanced quality and speed"

Question 2b: Default OpenRouter Model

Only show if user selected OpenRouter.

header: "OpenRouter Model"
question: "Default OpenRouter image generation model?"
options:
  - label: "google/gemini-3.1-flash-image-preview (Recommended)"
    description: "Best general-purpose OpenRouter image model with reference-image workflows"
  - label: "google/gemini-2.5-flash-image-preview"
    description: "Fast Gemini preview model on OpenRouter"
  - label: "black-forest-labs/flux.2-pro"
    description: "Strong text-to-image quality through OpenRouter"

Question 2c: Default Azure Deployment

Only show if user selected Azure OpenAI.

header: "Azure Deploy"
question: "Default Azure image deployment name?"
options:
  - label: "gpt-image-1.5 (Recommended)"
    description: "Best default if your Azure deployment uses the same name"
  - label: "gpt-image-1"
    description: "Previous GPT Image deployment name"

Question 2d: Default MiniMax Model

Only show if user selected MiniMax.

header: "MiniMax Model"
question: "Default MiniMax image generation model?"
options:
  - label: "image-01 (Recommended)"
    description: "Best default, supports aspect ratios and custom width/height"
  - label: "image-01-live"
    description: "Faster variant, use aspect ratio instead of custom size"

Question 2e: Default Z.AI Model

Only show if user selected Z.AI.

header: "Z.AI Model"
question: "Default Z.AI image generation model?"
options:
  - label: "glm-image (Recommended)"
    description: "Latest GLM-Image, best aspect-ratio coverage and text rendering"
  - label: "cogview-4-250304"
    description: "Legacy CogView-4 model with 16-pixel size stepping"
  - label: "cogview-4"
    description: "Previous CogView-4 snapshot for compatibility"

Question 3: Default Quality

header: "Quality"
question: "Default image quality?"
options:
  - label: "2k (Recommended)"
    description: "2048px - covers, illustrations, infographics"
  - label: "normal"
    description: "1024px - quick previews, drafts"

Question 4: Save Location

header: "Save"
question: "Where to save preferences?"
options:
  - label: "Project (Recommended)"
    description: ".baoyu-skills/ (this project only)"
  - label: "User"
    description: "~/.baoyu-skills/ (all projects)"

Save Locations

Choice	Path	Scope
Project	`.baoyu-skills/baoyu-image-gen/EXTEND.md`	Current project
User	`$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md`	All projects

EXTEND.md Template

---
version: 1
default_provider: [selected provider or null]
default_quality: [selected quality]
default_aspect_ratio: null
default_image_size: null
default_model:
  google: [selected google model or null]
  openai: null
  azure: [selected azure deployment or null]
  openrouter: [selected openrouter model or null]
  dashscope: null
  minimax: [selected minimax model or null]
  replicate: null
  zai: [selected zai model or null]
---

Flow 2: EXTEND.md Exists, Model Null

When EXTEND.md exists but default_model.[current_provider] is null, ask ONLY the model question for the current provider.

Google Model Selection

header: "Google Model"
question: "Choose a default Google image generation model?"
options:
  - label: "gemini-3-pro-image-preview (Recommended)"
    description: "Highest quality, best for production use"
  - label: "gemini-3.1-flash-image-preview"
    description: "Fast generation, good quality, lower cost"
  - label: "gemini-3-flash-preview"
    description: "Fast generation, balanced quality and speed"

OpenAI Model Selection

header: "OpenAI Model"
question: "Choose a default OpenAI image generation model?"
options:
  - label: "gpt-image-1.5 (Recommended)"
    description: "Latest GPT Image model, high quality"
  - label: "gpt-image-1"
    description: "Previous generation GPT Image model"

Azure Deployment Selection

header: "Azure Deploy"
question: "Choose a default Azure image deployment name?"
options:
  - label: "gpt-image-1.5 (Recommended)"
    description: "Use when your Azure deployment name matches the GPT-image-1.5 model"
  - label: "gpt-image-1"
    description: "Use when your Azure deployment name matches GPT-image-1"

Notes for Azure setup:

In baoyu-image-gen, Azure --model / default_model.azure should be the Azure deployment name, not just the underlying model family.
If the deployment name is custom, save that exact deployment name in default_model.azure.

OpenRouter Model Selection

header: "OpenRouter Model"
question: "Choose a default OpenRouter image generation model?"
options:
  - label: "google/gemini-3.1-flash-image-preview (Recommended)"
    description: "Recommended for image output and reference-image edits"
  - label: "google/gemini-2.5-flash-image-preview"
    description: "Fast preview-oriented image generation"
  - label: "black-forest-labs/flux.2-pro"
    description: "High-quality text-to-image through OpenRouter"

DashScope Model Selection

header: "DashScope Model"
question: "Choose a default DashScope image generation model?"
options:
  - label: "qwen-image-2.0-pro (Recommended)"
    description: "Best DashScope model for text rendering and custom sizes"
  - label: "qwen-image-2.0"
    description: "Faster 2.0 variant with flexible output size"
  - label: "qwen-image-max"
    description: "Legacy Qwen model with five fixed output sizes"
  - label: "qwen-image-plus"
    description: "Legacy Qwen model, same current capability as qwen-image"
  - label: "z-image-turbo"
    description: "Legacy DashScope model for compatibility"
  - label: "z-image-ultra"
    description: "Legacy DashScope model, higher quality but slower"

Notes for DashScope setup:

Prefer qwen-image-2.0-pro when the user needs custom --size, uncommon ratios like 21:9, or strong Chinese/English text rendering.
qwen-image-max / qwen-image-plus / qwen-image only support five fixed sizes: 1664*928, 1472*1104, 1328*1328, 1104*1472, 928*1664.
In baoyu-image-gen, quality is a compatibility preset. It is not a native DashScope parameter.

Replicate Model Selection

header: "Replicate Model"
question: "Choose a default Replicate image generation model?"
options:
  - label: "google/nano-banana-pro (Recommended)"
    description: "Google's fast image model on Replicate"
  - label: "google/nano-banana"
    description: "Google's base image model on Replicate"

MiniMax Model Selection

header: "MiniMax Model"
question: "Choose a default MiniMax image generation model?"
options:
  - label: "image-01 (Recommended)"
    description: "Best general-purpose MiniMax image model with custom width/height support"
  - label: "image-01-live"
    description: "Lower-latency MiniMax image model using aspect ratios"

Notes for MiniMax setup:

image-01 is the safest default. It supports official aspect_ratio values and documented custom width / height output sizes.
image-01-live is useful when the user prefers faster generation and can work with aspect-ratio-based sizing.
MiniMax subject reference currently uses subject_reference[].type = character; docs recommend front-facing portrait references in JPG/JPEG/PNG under 10MB.

Z.AI Model Selection

header: "Z.AI Model"
question: "Choose a default Z.AI image generation model?"
options:
  - label: "glm-image (Recommended)"
    description: "Latest GLM-Image; pixels round to multiples of 32 and cap at 2^22"
  - label: "cogview-4-250304"
    description: "Legacy CogView-4 snapshot with 16-pixel size stepping"
  - label: "cogview-4"
    description: "Earlier CogView-4 snapshot for compatibility"

Notes for Z.AI setup:

Set ZAI_API_KEY (or legacy BIGMODEL_API_KEY) from https://docs.z.ai/.
glm-image supports recommended aspect sizes (1280x1280, 1728x960, 1568x1056, …); uncommon ratios auto-fit to the 2^22 pixel budget on multiples of 32.
Legacy CogView models use 16-pixel stepping and cap at 2^21 pixels per image.
Z.AI does not accept reference images or n > 1 in baoyu-image-gen; use Google/OpenAI providers for those workflows.

Update EXTEND.md

After user selects a model:

1. Read existing EXTEND.md 2. If default_model: section exists → update the provider-specific key 3. If default_model: section missing → add the full section:

default_model:
  google: [value or null]
  openai: [value or null]
  azure: [value or null]
  openrouter: [value or null]
  dashscope: [value or null]
  minimax: [value or null]
  replicate: [value or null]
  zai: [value or null]

Only set the selected provider's model; leave others as their current value or null.

After Setup

1. Create directory if needed 2. Write/update EXTEND.md with frontmatter 3. Confirm: "Preferences saved to [path]" 4. Continue with image generation

Preferences Schema

Full Schema

---
version: 1

default_provider: null      # google|openai|azure|openrouter|dashscope|minimax|replicate|zai|null (null = auto-detect)

default_quality: null       # normal|2k|null (null = use default: 2k)

default_aspect_ratio: null  # "16:9"|"1:1"|"4:3"|"3:4"|"2.35:1"|null

default_image_size: null    # 1K|2K|4K|null (Google/OpenRouter, overrides quality)

default_model:
  google: null              # e.g., "gemini-3-pro-image-preview", "gemini-3.1-flash-image-preview"
  openai: null              # e.g., "gpt-image-1.5", "gpt-image-1"
  azure: null               # Azure deployment name, e.g., "gpt-image-1.5" or "image-prod"
  openrouter: null          # e.g., "google/gemini-3.1-flash-image-preview"
  dashscope: null           # e.g., "qwen-image-2.0-pro"
  minimax: null             # e.g., "image-01"
  replicate: null           # e.g., "google/nano-banana-pro"
  zai: null                 # e.g., "glm-image", "cogview-4-250304"

batch:
  max_workers: 10
  provider_limits:
    replicate:
      concurrency: 5
      start_interval_ms: 700
    google:
      concurrency: 3
      start_interval_ms: 1100
    openai:
      concurrency: 3
      start_interval_ms: 1100
    azure:
      concurrency: 3
      start_interval_ms: 1100
    openrouter:
      concurrency: 3
      start_interval_ms: 1100
    dashscope:
      concurrency: 3
      start_interval_ms: 1100
    minimax:
      concurrency: 3
      start_interval_ms: 1100
    zai:
      concurrency: 3
      start_interval_ms: 1100
---

Field Reference

Field	Type	Default	Description
`version`	int	1	Schema version
`default_provider`	string\	null	null
`default_quality`	string\	null	null
`default_aspect_ratio`	string\	null	null
`default_image_size`	string\	null	null
`default_model.google`	string\	null	null
`default_model.openai`	string\	null	null
`default_model.azure`	string\	null	null
`default_model.openrouter`	string\	null	null
`default_model.dashscope`	string\	null	null
`default_model.minimax`	string\	null	null
`default_model.replicate`	string\	null	null
`default_model.zai`	string\	null	null
`batch.max_workers`	int\	null	10
`batch.provider_limits.<provider>.concurrency`	int\	null	provider default
`batch.provider_limits.<provider>.start_interval_ms`	int\	null	provider default

Examples

Minimal:

---
version: 1
default_provider: google
default_quality: 2k
---

Full:

---
version: 1
default_provider: google
default_quality: 2k
default_aspect_ratio: "16:9"
default_image_size: 2K
default_model:
  google: "gemini-3-pro-image-preview"
  openai: "gpt-image-1.5"
  azure: "gpt-image-1.5"
  openrouter: "google/gemini-3.1-flash-image-preview"
  dashscope: "qwen-image-2.0-pro"
  minimax: "image-01"
  replicate: "google/nano-banana-pro"
  zai: "glm-image"
batch:
  max_workers: 10
  provider_limits:
    replicate:
      concurrency: 5
      start_interval_ms: 700
    azure:
      concurrency: 3
      start_interval_ms: 1100
    openrouter:
      concurrency: 3
      start_interval_ms: 1100
    minimax:
      concurrency: 3
      start_interval_ms: 1100
    zai:
      concurrency: 3
      start_interval_ms: 1100
---

DashScope (阿里通义万象)

Read when the user picks --provider dashscope, sets default_model.dashscope, or asks for Qwen-Image behavior. The SKILL.md only names the default — this file covers model families, sizing rules, and limits.

Model Families

*`qwen-image-2.0** — recommended modern family. Members: qwen-image-2.0-pro, qwen-image-2.0-pro-2026-03-03, qwen-image-2.0, qwen-image-2.0-2026-03-03`.

Free-form size in 宽*高 format
Total pixels must be between 512*512 and 2048*2048
Default ≈ 1024*1024
Best choice for custom ratios (e.g. 21:9) and text-heavy Chinese/English layouts

Fixed-size family — qwen-image-max, qwen-image-max-2025-12-30, qwen-image-plus, qwen-image-plus-2026-01-09, qwen-image.

Only five sizes allowed: 1664*928, 1472*1104, 1328*1328, 1104*1472, 928*1664
Default is 1664*928
qwen-image currently has the same capability as qwen-image-plus

Legacy — z-image-turbo, z-image-ultra, wanx-v1. Only use when the user explicitly asks for legacy behavior.

Size Resolution

--size wins over --ar
For qwen-image-2.0*: prefer explicit --size; otherwise infer from --ar using the recommended table below
For qwen-image-max/plus/image: only use the five fixed sizes; if the requested ratio doesn't fit, switch to qwen-image-2.0-pro
--quality is a baoyu-imagine preset, not an official DashScope field. The mapping of normal/2k onto the qwen-image-2.0* table is an implementation choice, not an API guarantee

Recommended `qwen-image-2.0*` sizes

Ratio	`normal`	`2k`
`1:1`	`1024*1024`	`1536*1536`
`2:3`	`768*1152`	`1024*1536`
`3:2`	`1152*768`	`1536*1024`
`3:4`	`960*1280`	`1080*1440`
`4:3`	`1280*960`	`1440*1080`
`9:16`	`720*1280`	`1080*1920`
`16:9`	`1280*720`	`1920*1080`
`21:9`	`1344*576`	`2048*872`

Not Exposed

DashScope APIs also support negative_prompt, prompt_extend, and watermark, but baoyu-imagine does not expose them as CLI flags today.

Official References

Replicate

Read when the user picks --provider replicate. Replicate support is intentionally scoped to model families baoyu-imagine can validate locally and save without dropping outputs.

Supported Families

*`google/nano-banana** (default: google/nano-banana-2`)

Supports prompt-only and reference-image generation
Uses Replicate aspect_ratio, resolution, and output_format
--size <WxH> is accepted only as a shorthand for a documented aspect_ratio plus 1K / 2K

`bytedance/seedream-4.5`

Supports prompt-only and reference-image generation
Uses Replicate size, aspect_ratio, and image_input
Local validation blocks unsupported 1K requests before the API call

`bytedance/seedream-5-lite`

Supports prompt-only and reference-image generation
Uses Replicate size, aspect_ratio, and image_input
Local validation currently accepts 2K / 3K only

`wan-video/wan-2.7-image`

Supports prompt-only and reference-image generation
Uses Replicate size and images
Max output is 2K

`wan-video/wan-2.7-image-pro`

Supports prompt-only and reference-image generation
Uses Replicate size and images
4K is allowed only for text-to-image; local validation blocks 4K + --ref

Guardrails

Replicate currently supports only single-output save semantics in this tool — keep --n 1
If a model is outside the compatibility list above, baoyu-imagine treats it as prompt-only and rejects advanced local options instead of guessing a nano-banana-style schema

Examples

# Default model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Explicit model
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Usage Examples

Extended CLI examples. SKILL.md shows the minimum set; read this file when the user asks about provider-specific invocation, batch generation, or less-common flags.

Core Patterns

# Basic text-to-image
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

# High quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (any provider family that supports refs)
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

Per-Provider

# OpenAI
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

# Azure OpenAI (model = deployment name)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider azure --model gpt-image-1.5

# Google with explicit model
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

# OpenRouter (recommended default)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter

# OpenRouter with reference
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

# DashScope (default model)
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

# DashScope Qwen-Image 2.0 Pro (custom size, Chinese text)
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报，包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

# DashScope legacy fixed-size
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928

# Z.AI GLM-image
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张带清晰中文标题的科技海报" --image out.png --provider zai

# Z.AI with custom size
${BUN_X} {baseDir}/scripts/main.ts --prompt "A science illustration with labels" --image out.png --provider zai --model glm-image --size 1472x1088

# MiniMax
${BUN_X} {baseDir}/scripts/main.ts --prompt "A fashion editorial portrait" --image out.jpg --provider minimax

# MiniMax with subject reference (character/portrait consistency)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A girl by the library window" --image out.jpg --provider minimax --model image-01 --ref portrait.png --ar 16:9

# Replicate (default: google/nano-banana-2)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Replicate Seedream 4.5
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cinematic portrait" --image out.png --provider replicate --model bytedance/seedream-4.5 --ar 3:2

# Replicate Wan 2.7 Image Pro
${BUN_X} {baseDir}/scripts/main.ts --prompt "A concept frame" --image out.png --provider replicate --model wan-video/wan-2.7-image-pro --size 2048x1152

Batch Mode

# Batch from saved prompt files
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json

# Batch with explicit worker count
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

Batch File Format

{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-2",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}

Paths in promptFiles, image, and ref are resolved relative to the batch file's directory. jobs is optional (overridden by CLI --jobs). A top-level array without the jobs wrapper is also accepted.

import assert from "node:assert/strict";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import test, { type TestContext } from "node:test";

import type { CliArgs, ExtendConfig } from "./types.ts";
import {
  createTaskArgs,
  detectProvider,
  getConfiguredMaxWorkers,
  getConfiguredProviderRateLimits,
  getWorkerCount,
  isRetryableGenerationError,
  loadBatchTasks,
  mergeConfig,
  normalizeOutputImagePath,
  parseArgs,
  parseSimpleYaml,
} from "./main.ts";

function makeArgs(overrides: Partial<CliArgs> = {}): CliArgs {
  return {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
    ...overrides,
  };
}

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

async function makeTempDir(prefix: string): Promise<string> {
  return fs.mkdtemp(path.join(os.tmpdir(), prefix));
}

test("parseArgs parses the main image-gen CLI flags", () => {
  const args = parseArgs([
    "--promptfiles",
    "prompts/system.md",
    "prompts/content.md",
    "--image",
    "out/hero",
    "--provider",
    "openai",
    "--quality",
    "2k",
    "--imageSize",
    "4k",
    "--ref",
    "ref/one.png",
    "ref/two.jpg",
    "--n",
    "3",
    "--jobs",
    "5",
    "--json",
  ]);

  assert.deepEqual(args.promptFiles, ["prompts/system.md", "prompts/content.md"]);
  assert.equal(args.imagePath, "out/hero");
  assert.equal(args.provider, "openai");
  assert.equal(args.quality, "2k");
  assert.equal(args.imageSize, "4K");
  assert.deepEqual(args.referenceImages, ["ref/one.png", "ref/two.jpg"]);
  assert.equal(args.n, 3);
  assert.equal(args.jobs, 5);
  assert.equal(args.json, true);
});

test("parseArgs falls back to positional prompt and rejects invalid provider", () => {
  const positional = parseArgs(["draw", "a", "cat"]);
  assert.equal(positional.prompt, "draw a cat");

  assert.throws(
    () => parseArgs(["--provider", "stability"]),
    /Invalid provider/,
  );
});

test("parseSimpleYaml parses nested defaults and provider limits", () => {
  const yaml = `
version: 2
default_provider: openrouter
default_quality: normal
default_aspect_ratio: '16:9'
default_image_size: 2K
default_model:
  google: gemini-3-pro-image-preview
  openai: gpt-image-1.5
  azure: image-prod
  minimax: image-01
batch:
  max_workers: 8
  provider_limits:
    google:
      concurrency: 2
      start_interval_ms: 900
    openai:
      concurrency: 4
    minimax:
      concurrency: 2
      start_interval_ms: 1400
    azure:
      concurrency: 1
      start_interval_ms: 1500
`;

  const config = parseSimpleYaml(yaml);

  assert.equal(config.version, 2);
  assert.equal(config.default_provider, "openrouter");
  assert.equal(config.default_quality, "normal");
  assert.equal(config.default_aspect_ratio, "16:9");
  assert.equal(config.default_image_size, "2K");
  assert.equal(config.default_model?.google, "gemini-3-pro-image-preview");
  assert.equal(config.default_model?.openai, "gpt-image-1.5");
  assert.equal(config.default_model?.azure, "image-prod");
  assert.equal(config.default_model?.minimax, "image-01");
  assert.equal(config.batch?.max_workers, 8);
  assert.deepEqual(config.batch?.provider_limits?.google, {
    concurrency: 2,
    start_interval_ms: 900,
  });
  assert.deepEqual(config.batch?.provider_limits?.openai, {
    concurrency: 4,
  });
  assert.deepEqual(config.batch?.provider_limits?.minimax, {
    concurrency: 2,
    start_interval_ms: 1400,
  });
  assert.deepEqual(config.batch?.provider_limits?.azure, {
    concurrency: 1,
    start_interval_ms: 1500,
  });
});

test("mergeConfig only fills values missing from CLI args", () => {
  const merged = mergeConfig(
    makeArgs({
      provider: "openai",
      quality: null,
      aspectRatio: null,
      imageSize: "4K",
    }),
    {
      default_provider: "google",
      default_quality: "2k",
      default_aspect_ratio: "3:2",
      default_image_size: "2K",
    } satisfies Partial<ExtendConfig>,
  );

  assert.equal(merged.provider, "openai");
  assert.equal(merged.quality, "2k");
  assert.equal(merged.aspectRatio, "3:2");
  assert.equal(merged.imageSize, "4K");
});

test("detectProvider rejects non-ref-capable providers and prefers Google first when multiple keys exist", (t) => {
  assert.throws(
    () =>
      detectProvider(
        makeArgs({
          provider: "dashscope",
          referenceImages: ["ref.png"],
        }),
      ),
    /Reference images require a ref-capable provider/,
  );

  useEnv(t, {
    GOOGLE_API_KEY: "google-key",
    OPENAI_API_KEY: "openai-key",
    OPENROUTER_API_KEY: null,
    DASHSCOPE_API_KEY: null,
    MINIMAX_API_KEY: null,
    REPLICATE_API_TOKEN: null,
    JIMENG_ACCESS_KEY_ID: null,
    JIMENG_SECRET_ACCESS_KEY: null,
    ARK_API_KEY: null,
  });
  assert.equal(detectProvider(makeArgs()), "google");
});

test("detectProvider selects an available ref-capable provider for reference-image tasks", (t) => {
  useEnv(t, {
    GOOGLE_API_KEY: null,
    OPENAI_API_KEY: "openai-key",
    AZURE_OPENAI_API_KEY: null,
    AZURE_OPENAI_BASE_URL: null,
    OPENROUTER_API_KEY: null,
    DASHSCOPE_API_KEY: null,
    MINIMAX_API_KEY: null,
    REPLICATE_API_TOKEN: null,
    JIMENG_ACCESS_KEY_ID: null,
    JIMENG_SECRET_ACCESS_KEY: null,
    ARK_API_KEY: null,
  });
  assert.equal(
    detectProvider(makeArgs({ referenceImages: ["ref.png"] })),
    "openai",
  );
});

test("detectProvider selects Azure when only Azure credentials are configured", (t) => {
  useEnv(t, {
    GOOGLE_API_KEY: null,
    OPENAI_API_KEY: null,
    AZURE_OPENAI_API_KEY: "azure-key",
    AZURE_OPENAI_BASE_URL: "https://example.openai.azure.com",
    OPENROUTER_API_KEY: null,
    DASHSCOPE_API_KEY: null,
    MINIMAX_API_KEY: null,
    REPLICATE_API_TOKEN: null,
    JIMENG_ACCESS_KEY_ID: null,
    JIMENG_SECRET_ACCESS_KEY: null,
    ARK_API_KEY: null,
  });

  assert.equal(detectProvider(makeArgs()), "azure");
  assert.equal(
    detectProvider(makeArgs({ referenceImages: ["ref.png"] })),
    "azure",
  );
});

test("detectProvider infers Seedream from model id and allows Seedream reference-image workflows", (t) => {
  useEnv(t, {
    GOOGLE_API_KEY: null,
    OPENAI_API_KEY: null,
    OPENROUTER_API_KEY: null,
    DASHSCOPE_API_KEY: null,
    MINIMAX_API_KEY: null,
    REPLICATE_API_TOKEN: null,
    JIMENG_ACCESS_KEY_ID: null,
    JIMENG_SECRET_ACCESS_KEY: null,
    ARK_API_KEY: "ark-key",
  });

  assert.equal(
    detectProvider(
      makeArgs({
        model: "doubao-seedream-4-5-251128",
        referenceImages: ["ref.png"],
      }),
    ),
    "seedream",
  );

  assert.equal(
    detectProvider(
      makeArgs({
        provider: "seedream",
        referenceImages: ["ref.png"],
      }),
    ),
    "seedream",
  );
});

test("detectProvider selects MiniMax when only MiniMax credentials are configured or the model id matches", (t) => {
  useEnv(t, {
    GOOGLE_API_KEY: null,
    OPENAI_API_KEY: null,
    AZURE_OPENAI_API_KEY: null,
    AZURE_OPENAI_BASE_URL: null,
    OPENROUTER_API_KEY: null,
    DASHSCOPE_API_KEY: null,
    MINIMAX_API_KEY: "minimax-key",
    REPLICATE_API_TOKEN: null,
    JIMENG_ACCESS_KEY_ID: null,
    JIMENG_SECRET_ACCESS_KEY: null,
    ARK_API_KEY: null,
  });

  assert.equal(detectProvider(makeArgs()), "minimax");
  assert.equal(detectProvider(makeArgs({ referenceImages: ["ref.png"] })), "minimax");
  assert.equal(detectProvider(makeArgs({ model: "image-01-live" })), "minimax");
});

test("batch worker and provider-rate-limit configuration prefer env over EXTEND config", (t) => {
  useEnv(t, {
    BAOYU_IMAGE_GEN_MAX_WORKERS: "12",
    BAOYU_IMAGE_GEN_GOOGLE_CONCURRENCY: "5",
    BAOYU_IMAGE_GEN_GOOGLE_START_INTERVAL_MS: "450",
  });

  const extendConfig: Partial<ExtendConfig> = {
    batch: {
      max_workers: 7,
      provider_limits: {
        google: {
          concurrency: 2,
          start_interval_ms: 900,
        },
        minimax: {
          concurrency: 1,
          start_interval_ms: 1500,
        },
      },
    },
  };

  assert.equal(getConfiguredMaxWorkers(extendConfig), 12);
  assert.deepEqual(getConfiguredProviderRateLimits(extendConfig).google, {
    concurrency: 5,
    startIntervalMs: 450,
  });
  assert.deepEqual(getConfiguredProviderRateLimits(extendConfig).minimax, {
    concurrency: 1,
    startIntervalMs: 1500,
  });
});

test("loadBatchTasks and createTaskArgs resolve batch-relative paths", async (t) => {
  const root = await makeTempDir("baoyu-image-gen-batch-");
  t.after(() => fs.rm(root, { recursive: true, force: true }));

  const batchFile = path.join(root, "jobs", "batch.json");
  await fs.mkdir(path.dirname(batchFile), { recursive: true });
  await fs.writeFile(
    batchFile,
    JSON.stringify({
      jobs: 2,
      tasks: [
        {
          id: "hero",
          promptFiles: ["prompts/hero.md"],
          image: "out/hero",
          ref: ["refs/hero.png"],
          ar: "16:9",
        },
      ],
    }),
  );

  const loaded = await loadBatchTasks(batchFile);
  assert.equal(loaded.jobs, 2);
  assert.equal(loaded.batchDir, path.dirname(batchFile));
  assert.equal(loaded.tasks[0]?.id, "hero");

  const taskArgs = createTaskArgs(
    makeArgs({
      provider: "replicate",
      quality: "2k",
      json: true,
    }),
    loaded.tasks[0]!,
    loaded.batchDir,
  );

  assert.deepEqual(taskArgs.promptFiles, [
    path.join(loaded.batchDir, "prompts/hero.md"),
  ]);
  assert.equal(taskArgs.imagePath, path.join(loaded.batchDir, "out/hero"));
  assert.deepEqual(taskArgs.referenceImages, [
    path.join(loaded.batchDir, "refs/hero.png"),
  ]);
  assert.equal(taskArgs.provider, "replicate");
  assert.equal(taskArgs.aspectRatio, "16:9");
  assert.equal(taskArgs.quality, "2k");
  assert.equal(taskArgs.json, true);
});

test("path normalization, worker count, and retry classification follow expected rules", () => {
  assert.match(normalizeOutputImagePath("out/sample"), /out[\\/]+sample\.png$/);
  assert.match(normalizeOutputImagePath("out/sample", ".jpg"), /out[\\/]+sample\.jpg$/);
  assert.match(normalizeOutputImagePath("out/sample.webp"), /out[\\/]+sample\.webp$/);

  assert.equal(getWorkerCount(8, null, 3), 3);
  assert.equal(getWorkerCount(2, 6, 5), 2);
  assert.equal(getWorkerCount(5, 0, 4), 1);

  assert.equal(isRetryableGenerationError(new Error("API error (401): denied")), false);
  assert.equal(isRetryableGenerationError(new Error("socket hang up")), true);
});

import path from "node:path";
import process from "node:process";
import { homedir } from "node:os";
import { fileURLToPath } from "node:url";
import { access, mkdir, readFile, writeFile } from "node:fs/promises";
import type {
  BatchFile,
  BatchTaskInput,
  CliArgs,
  ExtendConfig,
  Provider,
} from "./types";

type ProviderModule = {
  getDefaultModel: () => string;
  generateImage: (prompt: string, model: string, args: CliArgs) => Promise<Uint8Array>;
  validateArgs?: (model: string, args: CliArgs) => void;
  getDefaultOutputExtension?: (model: string, args: CliArgs) => string;
};

type PreparedTask = {
  id: string;
  prompt: string;
  args: CliArgs;
  provider: Provider;
  model: string;
  outputPath: string;
  providerModule: ProviderModule;
};

type TaskResult = {
  id: string;
  provider: Provider;
  model: string;
  outputPath: string;
  success: boolean;
  attempts: number;
  error: string | null;
};

type ProviderRateLimit = {
  concurrency: number;
  startIntervalMs: number;
};

type LoadedBatchTasks = {
  tasks: BatchTaskInput[];
  jobs: number | null;
  batchDir: string;
};

const MAX_ATTEMPTS = 3;
const DEFAULT_MAX_WORKERS = 10;
const POLL_WAIT_MS = 250;
const DEFAULT_PROVIDER_RATE_LIMITS: Record<Provider, ProviderRateLimit> = {
  replicate: { concurrency: 5, startIntervalMs: 700 },
  google: { concurrency: 3, startIntervalMs: 1100 },
  openai: { concurrency: 3, startIntervalMs: 1100 },
  openrouter: { concurrency: 3, startIntervalMs: 1100 },
  dashscope: { concurrency: 3, startIntervalMs: 1100 },
  minimax: { concurrency: 3, startIntervalMs: 1100 },
  jimeng: { concurrency: 3, startIntervalMs: 1100 },
  seedream: { concurrency: 3, startIntervalMs: 1100 },
  azure: { concurrency: 3, startIntervalMs: 1100 },
  zai: { concurrency: 3, startIntervalMs: 1100 },
};

function printUsage(): void {
  console.log(`Usage:
  npx -y bun scripts/main.ts --prompt "A cat" --image cat.png
  npx -y bun scripts/main.ts --promptfiles system.md content.md --image out.png
  npx -y bun scripts/main.ts --batchfile batch.json

Options:
  -p, --prompt <text>       Prompt text
  --promptfiles <files...>  Read prompt from files (concatenated)
  --image <path>            Output image path (required in single-image mode)
  --batchfile <path>        JSON batch file for multi-image generation
  --jobs <count>            Worker count for batch mode (default: auto, max from config, built-in default 10)
  --provider google|openai|openrouter|dashscope|minimax|replicate|jimeng|seedream|azure|zai  Force provider (auto-detect by default)
  -m, --model <id>          Model ID
  --ar <ratio>              Aspect ratio (e.g., 16:9, 1:1, 4:3)
  --size <WxH>              Size (e.g., 1024x1024)
  --quality normal|2k       Quality preset (default: 2k)
  --imageSize 1K|2K|4K      Image size for Google/OpenRouter (default: from quality)
  --ref <files...>          Reference images (Google, OpenAI, Azure, OpenRouter, Replicate, MiniMax, or Seedream 4.0/4.5/5.0)
  --n <count>               Number of images for the current task (default: 1)
  --json                    JSON output
  -h, --help                Show help

Batch file format:
  {
    "jobs": 4,
    "tasks": [
      {
        "id": "hero",
        "promptFiles": ["prompts/hero.md"],
        "image": "out/hero.png",
        "provider": "replicate",
        "model": "google/nano-banana-pro",
        "ar": "16:9"
      }
    ]
  }

Behavior:
  - Batch mode automatically runs in parallel when pending tasks >= 2
  - Each image retries automatically up to 3 attempts
  - Batch summary reports success count, failure count, and per-image errors

Environment variables:
  OPENAI_API_KEY            OpenAI API key
  OPENROUTER_API_KEY        OpenRouter API key
  GOOGLE_API_KEY            Google API key
  GEMINI_API_KEY            Gemini API key (alias for GOOGLE_API_KEY)
  DASHSCOPE_API_KEY         DashScope API key
  MINIMAX_API_KEY           MiniMax API key
  REPLICATE_API_TOKEN       Replicate API token
  JIMENG_ACCESS_KEY_ID      Jimeng Access Key ID
  JIMENG_SECRET_ACCESS_KEY  Jimeng Secret Access Key
  ARK_API_KEY               Seedream/Ark API key
  ZAI_API_KEY               Z.AI API key (alias: BIGMODEL_API_KEY)
  BIGMODEL_API_KEY          Z.AI API key alias (legacy BigModel credentials)
  OPENAI_IMAGE_MODEL        Default OpenAI model (gpt-image-1.5)
  OPENROUTER_IMAGE_MODEL    Default OpenRouter model (google/gemini-3.1-flash-image-preview)
  GOOGLE_IMAGE_MODEL        Default Google model (gemini-3-pro-image-preview)
  DASHSCOPE_IMAGE_MODEL     Default DashScope model (qwen-image-2.0-pro)
  MINIMAX_IMAGE_MODEL       Default MiniMax model (image-01)
  REPLICATE_IMAGE_MODEL     Default Replicate model (google/nano-banana-pro)
  JIMENG_IMAGE_MODEL        Default Jimeng model (jimeng_t2i_v40)
  SEEDREAM_IMAGE_MODEL      Default Seedream model (doubao-seedream-5-0-260128)
  ZAI_IMAGE_MODEL           Default Z.AI model (glm-image)
  BIGMODEL_IMAGE_MODEL      Z.AI model alias (legacy BigModel variable)
  OPENAI_BASE_URL           Custom OpenAI endpoint
  OPENAI_IMAGE_USE_CHAT     Use /chat/completions instead of /images/generations (true|false)
  OPENROUTER_BASE_URL       Custom OpenRouter endpoint
  OPENROUTER_HTTP_REFERER   Optional app URL for OpenRouter attribution
  OPENROUTER_TITLE          Optional app name for OpenRouter attribution
  GOOGLE_BASE_URL           Custom Google endpoint
  DASHSCOPE_BASE_URL        Custom DashScope endpoint
  MINIMAX_BASE_URL          Custom MiniMax endpoint
  REPLICATE_BASE_URL        Custom Replicate endpoint
  JIMENG_BASE_URL           Custom Jimeng endpoint
  AZURE_OPENAI_API_KEY      Azure OpenAI API key
  AZURE_OPENAI_BASE_URL     Azure OpenAI resource or deployment endpoint
  AZURE_OPENAI_DEPLOYMENT   Default Azure deployment name
  AZURE_API_VERSION         Azure API version (default: 2025-04-01-preview)
  AZURE_OPENAI_IMAGE_MODEL  Backward-compatible Azure deployment/model alias (defaults to gpt-image-1.5)
  SEEDREAM_BASE_URL         Custom Seedream endpoint
  ZAI_BASE_URL              Custom Z.AI endpoint (defaults to https://api.z.ai/api/paas/v4)
  BIGMODEL_BASE_URL         Z.AI endpoint alias (legacy BigModel variable)
  BAOYU_IMAGE_GEN_MAX_WORKERS  Override batch worker cap
  BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY  Override provider concurrency
  BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS  Override provider start gap in ms

Env file load order: CLI args > EXTEND.md > process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env`);
}

export function parseArgs(argv: string[]): CliArgs {
  const out: CliArgs = {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
  };

  const positional: string[] = [];

  const takeMany = (i: number): { items: string[]; next: number } => {
    const items: string[] = [];
    let j = i + 1;
    while (j < argv.length) {
      const v = argv[j]!;
      if (v.startsWith("-")) break;
      items.push(v);
      j++;
    }
    return { items, next: j - 1 };
  };

  for (let i = 0; i < argv.length; i++) {
    const a = argv[i]!;

    if (a === "--help" || a === "-h") {
      out.help = true;
      continue;
    }

    if (a === "--json") {
      out.json = true;
      continue;
    }

    if (a === "--prompt" || a === "-p") {
      const v = argv[++i];
      if (!v) throw new Error(`Missing value for ${a}`);
      out.prompt = v;
      continue;
    }

    if (a === "--promptfiles") {
      const { items, next } = takeMany(i);
      if (items.length === 0) throw new Error("Missing files for --promptfiles");
      out.promptFiles.push(...items);
      i = next;
      continue;
    }

    if (a === "--image") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --image");
      out.imagePath = v;
      continue;
    }

    if (a === "--batchfile") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --batchfile");
      out.batchFile = v;
      continue;
    }

    if (a === "--jobs") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --jobs");
      out.jobs = parseInt(v, 10);
      if (isNaN(out.jobs) || out.jobs < 1) throw new Error(`Invalid worker count: ${v}`);
      continue;
    }

    if (a === "--provider") {
      const v = argv[++i];
      if (
        v !== "google" &&
        v !== "openai" &&
        v !== "openrouter" &&
        v !== "dashscope" &&
        v !== "minimax" &&
        v !== "replicate" &&
        v !== "jimeng" &&
        v !== "seedream" &&
        v !== "azure" &&
        v !== "zai"
      ) {
        throw new Error(`Invalid provider: ${v}`);
      }
      out.provider = v;
      continue;
    }

    if (a === "--model" || a === "-m") {
      const v = argv[++i];
      if (!v) throw new Error(`Missing value for ${a}`);
      out.model = v;
      continue;
    }

    if (a === "--ar") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --ar");
      out.aspectRatio = v;
      continue;
    }

    if (a === "--size") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --size");
      out.size = v;
      continue;
    }

    if (a === "--quality") {
      const v = argv[++i];
      if (v !== "normal" && v !== "2k") throw new Error(`Invalid quality: ${v}`);
      out.quality = v;
      continue;
    }

    if (a === "--imageSize") {
      const v = argv[++i]?.toUpperCase();
      if (v !== "1K" && v !== "2K" && v !== "4K") throw new Error(`Invalid imageSize: ${v}`);
      out.imageSize = v;
      continue;
    }

    if (a === "--ref" || a === "--reference") {
      const { items, next } = takeMany(i);
      if (items.length === 0) throw new Error(`Missing files for ${a}`);
      out.referenceImages.push(...items);
      i = next;
      continue;
    }

    if (a === "--n") {
      const v = argv[++i];
      if (!v) throw new Error("Missing value for --n");
      out.n = parseInt(v, 10);
      if (isNaN(out.n) || out.n < 1) throw new Error(`Invalid count: ${v}`);
      continue;
    }

    if (a.startsWith("-")) {
      throw new Error(`Unknown option: ${a}`);
    }

    positional.push(a);
  }

  if (!out.prompt && out.promptFiles.length === 0 && positional.length > 0) {
    out.prompt = positional.join(" ");
  }

  return out;
}

async function loadEnvFile(p: string): Promise<Record<string, string>> {
  try {
    const content = await readFile(p, "utf8");
    const env: Record<string, string> = {};
    for (const line of content.split("\n")) {
      const trimmed = line.trim();
      if (!trimmed || trimmed.startsWith("#")) continue;
      const idx = trimmed.indexOf("=");
      if (idx === -1) continue;
      const key = trimmed.slice(0, idx).trim();
      let val = trimmed.slice(idx + 1).trim();
      if ((val.startsWith('"') && val.endsWith('"')) || (val.startsWith("'") && val.endsWith("'"))) {
        val = val.slice(1, -1);
      }
      env[key] = val;
    }
    return env;
  } catch {
    return {};
  }
}

async function loadEnv(): Promise<void> {
  const home = homedir();
  const cwd = process.cwd();

  const homeEnv = await loadEnvFile(path.join(home, ".baoyu-skills", ".env"));
  const cwdEnv = await loadEnvFile(path.join(cwd, ".baoyu-skills", ".env"));

  for (const [k, v] of Object.entries(homeEnv)) {
    if (!process.env[k]) process.env[k] = v;
  }
  for (const [k, v] of Object.entries(cwdEnv)) {
    if (!process.env[k]) process.env[k] = v;
  }
}

export function extractYamlFrontMatter(content: string): string | null {
  const match = content.match(/^---\s*\n([\s\S]*?)\n---\s*$/m);
  return match ? match[1] : null;
}

export function parseSimpleYaml(yaml: string): Partial<ExtendConfig> {
  const config: Partial<ExtendConfig> = {};
  const lines = yaml.split("\n");
  let currentKey: string | null = null;
  let currentProvider: Provider | null = null;

  for (const line of lines) {
    const trimmed = line.trim();
    const indent = line.match(/^\s*/)?.[0].length ?? 0;
    if (!trimmed || trimmed.startsWith("#")) continue;

    if (trimmed.includes(":") && !trimmed.startsWith("-")) {
      const colonIdx = trimmed.indexOf(":");
      const key = trimmed.slice(0, colonIdx).trim();
      let value = trimmed.slice(colonIdx + 1).trim();

      if (value === "null" || value === "") {
        value = "null";
      }

      if (key === "version") {
        config.version = value === "null" ? 1 : parseInt(value, 10);
      } else if (key === "default_provider") {
        config.default_provider = value === "null" ? null : (value as Provider);
      } else if (key === "default_quality") {
        config.default_quality = value === "null" ? null : value as "normal" | "2k";
      } else if (key === "default_aspect_ratio") {
        const cleaned = value.replace(/['"]/g, "");
        config.default_aspect_ratio = cleaned === "null" ? null : cleaned;
      } else if (key === "default_image_size") {
        config.default_image_size = value === "null" ? null : value as "1K" | "2K" | "4K";
      } else if (key === "default_model") {
        config.default_model = {
          google: null,
          openai: null,
          openrouter: null,
          dashscope: null,
          minimax: null,
          replicate: null,
          jimeng: null,
          seedream: null,
          azure: null,
          zai: null,
        };
        currentKey = "default_model";
        currentProvider = null;
      } else if (key === "batch") {
        config.batch = {};
        currentKey = "batch";
        currentProvider = null;
      } else if (currentKey === "batch" && indent >= 2 && key === "max_workers") {
        config.batch ??= {};
        config.batch.max_workers = value === "null" ? null : parseInt(value, 10);
      } else if (currentKey === "batch" && indent >= 2 && key === "provider_limits") {
        config.batch ??= {};
        config.batch.provider_limits ??= {};
        currentKey = "provider_limits";
        currentProvider = null;
      } else if (
        currentKey === "provider_limits" &&
        indent >= 4 &&
        (
          key === "google" ||
          key === "openai" ||
          key === "openrouter" ||
          key === "dashscope" ||
          key === "minimax" ||
          key === "replicate" ||
          key === "jimeng" ||
          key === "seedream" ||
          key === "azure" ||
          key === "zai"
        )
      ) {
        config.batch ??= {};
        config.batch.provider_limits ??= {};
        config.batch.provider_limits[key] ??= {};
        currentProvider = key;
      } else if (
        currentKey === "default_model" &&
        (
          key === "google" ||
          key === "openai" ||
          key === "openrouter" ||
          key === "dashscope" ||
          key === "minimax" ||
          key === "replicate" ||
          key === "jimeng" ||
          key === "seedream" ||
          key === "azure" ||
          key === "zai"
        )
      ) {
        const cleaned = value.replace(/['"]/g, "");
        config.default_model![key] = cleaned === "null" ? null : cleaned;
      } else if (
        currentKey === "provider_limits" &&
        currentProvider &&
        indent >= 6 &&
        (key === "concurrency" || key === "start_interval_ms")
      ) {
        config.batch ??= {};
        config.batch.provider_limits ??= {};
        const providerLimit = (config.batch.provider_limits[currentProvider] ??= {});
        if (key === "concurrency") {
          providerLimit.concurrency = value === "null" ? null : parseInt(value, 10);
        } else {
          providerLimit.start_interval_ms = value === "null" ? null : parseInt(value, 10);
        }
      }
    }
  }

  return config;
}

async function loadExtendConfig(): Promise<Partial<ExtendConfig>> {
  const home = homedir();
  const cwd = process.cwd();

  const paths = [
    path.join(cwd, ".baoyu-skills", "baoyu-image-gen", "EXTEND.md"),
    path.join(home, ".baoyu-skills", "baoyu-image-gen", "EXTEND.md"),
  ];

  for (const p of paths) {
    try {
      const content = await readFile(p, "utf8");
      const yaml = extractYamlFrontMatter(content);
      if (!yaml) continue;
      return parseSimpleYaml(yaml);
    } catch {
      continue;
    }
  }

  return {};
}

export function mergeConfig(args: CliArgs, extend: Partial<ExtendConfig>): CliArgs {
  return {
    ...args,
    provider: args.provider ?? extend.default_provider ?? null,
    quality: args.quality ?? extend.default_quality ?? null,
    aspectRatio: args.aspectRatio ?? extend.default_aspect_ratio ?? null,
    imageSize: args.imageSize ?? extend.default_image_size ?? null,
  };
}

export function parsePositiveInt(value: string | undefined): number | null {
  if (!value) return null;
  const parsed = parseInt(value, 10);
  return Number.isFinite(parsed) && parsed > 0 ? parsed : null;
}

export function parsePositiveBatchInt(value: unknown): number | null {
  if (value === null || value === undefined) return null;
  if (typeof value === "number") {
    return Number.isInteger(value) && value > 0 ? value : null;
  }
  if (typeof value === "string") {
    return parsePositiveInt(value);
  }
  return null;
}

export function getConfiguredMaxWorkers(extendConfig: Partial<ExtendConfig>): number {
  const envValue = parsePositiveInt(process.env.BAOYU_IMAGE_GEN_MAX_WORKERS);
  const configValue = extendConfig.batch?.max_workers ?? null;
  return Math.max(1, envValue ?? configValue ?? DEFAULT_MAX_WORKERS);
}

export function getConfiguredProviderRateLimits(
  extendConfig: Partial<ExtendConfig>
): Record<Provider, ProviderRateLimit> {
  const configured: Record<Provider, ProviderRateLimit> = {
    replicate: { ...DEFAULT_PROVIDER_RATE_LIMITS.replicate },
    google: { ...DEFAULT_PROVIDER_RATE_LIMITS.google },
    openai: { ...DEFAULT_PROVIDER_RATE_LIMITS.openai },
    openrouter: { ...DEFAULT_PROVIDER_RATE_LIMITS.openrouter },
    dashscope: { ...DEFAULT_PROVIDER_RATE_LIMITS.dashscope },
    minimax: { ...DEFAULT_PROVIDER_RATE_LIMITS.minimax },
    jimeng: { ...DEFAULT_PROVIDER_RATE_LIMITS.jimeng },
    seedream: { ...DEFAULT_PROVIDER_RATE_LIMITS.seedream },
    azure: { ...DEFAULT_PROVIDER_RATE_LIMITS.azure },
    zai: { ...DEFAULT_PROVIDER_RATE_LIMITS.zai },
  };

  for (const provider of ["replicate", "google", "openai", "openrouter", "dashscope", "minimax", "jimeng", "seedream", "azure", "zai"] as Provider[]) {
    const envPrefix = `BAOYU_IMAGE_GEN_${provider.toUpperCase()}`;
    const extendLimit = extendConfig.batch?.provider_limits?.[provider];
    configured[provider] = {
      concurrency:
        parsePositiveInt(process.env[`${envPrefix}_CONCURRENCY`]) ??
        extendLimit?.concurrency ??
        configured[provider].concurrency,
      startIntervalMs:
        parsePositiveInt(process.env[`${envPrefix}_START_INTERVAL_MS`]) ??
        extendLimit?.start_interval_ms ??
        configured[provider].startIntervalMs,
    };
  }

  return configured;
}

async function readPromptFromFiles(files: string[]): Promise<string> {
  const parts: string[] = [];
  for (const f of files) {
    parts.push(await readFile(f, "utf8"));
  }
  return parts.join("\n\n");
}

async function readPromptFromStdin(): Promise<string | null> {
  if (process.stdin.isTTY) return null;
  try {
    const chunks: Buffer[] = [];
    for await (const chunk of process.stdin) {
      chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
    }
    const value = Buffer.concat(chunks).toString("utf8").trim();
    return value.length > 0 ? value : null;
  } catch {
    return null;
  }
}

export function normalizeOutputImagePath(p: string, defaultExtension = ".png"): string {
  const full = path.resolve(p);
  const ext = path.extname(full);
  if (ext) return full;
  return `${full}${defaultExtension}`;
}

function inferProviderFromModel(model: string | null): Provider | null {
  if (!model) return null;
  const normalized = model.trim();
  if (normalized.includes("seedream") || normalized.includes("seededit")) return "seedream";
  if (normalized === "image-01" || normalized === "image-01-live") return "minimax";
  return null;
}

export function detectProvider(args: CliArgs): Provider {
  if (
    args.referenceImages.length > 0 &&
    args.provider &&
    args.provider !== "google" &&
    args.provider !== "openai" &&
    args.provider !== "azure" &&
    args.provider !== "openrouter" &&
    args.provider !== "replicate" &&
    args.provider !== "seedream" &&
    args.provider !== "minimax"
  ) {
    throw new Error(
      "Reference images require a ref-capable provider. Use --provider google (Gemini multimodal), --provider openai (GPT Image edits), --provider azure (Azure OpenAI), --provider openrouter (OpenRouter multimodal), --provider replicate, --provider seedream for supported Seedream models, or --provider minimax for MiniMax subject-reference workflows."
    );
  }

  if (args.provider) return args.provider;

  const hasGoogle = !!(process.env.GOOGLE_API_KEY || process.env.GEMINI_API_KEY);
  const hasAzure = !!(process.env.AZURE_OPENAI_API_KEY && process.env.AZURE_OPENAI_BASE_URL);
  const hasOpenai = !!process.env.OPENAI_API_KEY;
  const hasOpenrouter = !!process.env.OPENROUTER_API_KEY;
  const hasDashscope = !!process.env.DASHSCOPE_API_KEY;
  const hasMinimax = !!process.env.MINIMAX_API_KEY;
  const hasReplicate = !!process.env.REPLICATE_API_TOKEN;
  const hasJimeng = !!(process.env.JIMENG_ACCESS_KEY_ID && process.env.JIMENG_SECRET_ACCESS_KEY);
  const hasSeedream = !!process.env.ARK_API_KEY;
  const hasZai = !!(process.env.ZAI_API_KEY || process.env.BIGMODEL_API_KEY);
  const modelProvider = inferProviderFromModel(args.model);

  if (modelProvider === "seedream") {
    if (!hasSeedream) {
      throw new Error("Model looks like a Volcengine ARK image model, but ARK_API_KEY is not set.");
    }
    return "seedream";
  }

  if (modelProvider === "minimax") {
    if (!hasMinimax) {
      throw new Error("Model looks like a MiniMax image model, but MINIMAX_API_KEY is not set.");
    }
    return "minimax";
  }

  if (args.referenceImages.length > 0) {
    if (hasGoogle) return "google";
    if (hasOpenai) return "openai";
    if (hasAzure) return "azure";
    if (hasOpenrouter) return "openrouter";
    if (hasReplicate) return "replicate";
    if (hasSeedream) return "seedream";
    if (hasMinimax) return "minimax";
    throw new Error(
      "Reference images require Google, OpenAI, Azure, OpenRouter, Replicate, supported Seedream models, or MiniMax. Set GOOGLE_API_KEY/GEMINI_API_KEY, OPENAI_API_KEY, AZURE_OPENAI_API_KEY+AZURE_OPENAI_BASE_URL, OPENROUTER_API_KEY, REPLICATE_API_TOKEN, ARK_API_KEY, or MINIMAX_API_KEY, or remove --ref."
    );
  }

  const available = [
    hasGoogle && "google",
    hasOpenai && "openai",
    hasAzure && "azure",
    hasOpenrouter && "openrouter",
    hasDashscope && "dashscope",
    hasMinimax && "minimax",
    hasReplicate && "replicate",
    hasJimeng && "jimeng",
    hasSeedream && "seedream",
    hasZai && "zai",
  ].filter(Boolean) as Provider[];

  if (available.length === 1) return available[0]!;
  if (available.length > 1) return available[0]!;

  throw new Error(
    "No API key found. Set GOOGLE_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, AZURE_OPENAI_API_KEY+AZURE_OPENAI_BASE_URL, OPENROUTER_API_KEY, DASHSCOPE_API_KEY, MINIMAX_API_KEY, REPLICATE_API_TOKEN, JIMENG keys, ARK_API_KEY, or ZAI_API_KEY/BIGMODEL_API_KEY.\n" +
      "Create ~/.baoyu-skills/.env or <cwd>/.baoyu-skills/.env with your keys."
  );
}

export async function validateReferenceImages(referenceImages: string[]): Promise<void> {
  for (const refPath of referenceImages) {
    const fullPath = path.resolve(refPath);
    try {
      await access(fullPath);
    } catch {
      throw new Error(`Reference image not found: ${fullPath}`);
    }
  }
}

export function isRetryableGenerationError(error: unknown): boolean {
  const msg = error instanceof Error ? error.message : String(error);
  const nonRetryableMarkers = [
    "Reference image",
    "not supported",
    "only supported",
    "No API key found",
    "is required",
    "Invalid ",
    "Unexpected ",
    "API error (400)",
    "API error (401)",
    "API error (402)",
    "API error (403)",
    "API error (404)",
    "temporarily disabled",
  ];
  return !nonRetryableMarkers.some((marker) => msg.includes(marker));
}

async function loadProviderModule(provider: Provider): Promise<ProviderModule> {
  if (provider === "google") return (await import("./providers/google")) as ProviderModule;
  if (provider === "dashscope") return (await import("./providers/dashscope")) as ProviderModule;
  if (provider === "minimax") return (await import("./providers/minimax")) as ProviderModule;
  if (provider === "replicate") return (await import("./providers/replicate")) as ProviderModule;
  if (provider === "openrouter") return (await import("./providers/openrouter")) as ProviderModule;
  if (provider === "jimeng") return (await import("./providers/jimeng")) as ProviderModule;
  if (provider === "seedream") return (await import("./providers/seedream")) as ProviderModule;
  if (provider === "azure") return (await import("./providers/azure")) as ProviderModule;
  if (provider === "zai") return (await import("./providers/zai")) as ProviderModule;
  return (await import("./providers/openai")) as ProviderModule;
}

async function loadPromptForArgs(args: CliArgs): Promise<string | null> {
  let prompt: string | null = args.prompt;
  if (!prompt && args.promptFiles.length > 0) {
    prompt = await readPromptFromFiles(args.promptFiles);
  }
  return prompt;
}

function getModelForProvider(
  provider: Provider,
  requestedModel: string | null,
  extendConfig: Partial<ExtendConfig>,
  providerModule: ProviderModule
): string {
  if (requestedModel) return requestedModel;
  if (extendConfig.default_model) {
    if (provider === "google" && extendConfig.default_model.google) return extendConfig.default_model.google;
    if (provider === "openai" && extendConfig.default_model.openai) return extendConfig.default_model.openai;
    if (provider === "openrouter" && extendConfig.default_model.openrouter) {
      return extendConfig.default_model.openrouter;
    }
    if (provider === "dashscope" && extendConfig.default_model.dashscope) return extendConfig.default_model.dashscope;
    if (provider === "minimax" && extendConfig.default_model.minimax) return extendConfig.default_model.minimax;
    if (provider === "replicate" && extendConfig.default_model.replicate) return extendConfig.default_model.replicate;
    if (provider === "jimeng" && extendConfig.default_model.jimeng) return extendConfig.default_model.jimeng;
    if (provider === "seedream" && extendConfig.default_model.seedream) return extendConfig.default_model.seedream;
    if (provider === "azure" && extendConfig.default_model.azure) return extendConfig.default_model.azure;
    if (provider === "zai" && extendConfig.default_model.zai) return extendConfig.default_model.zai;
  }
  return providerModule.getDefaultModel();
}

async function prepareSingleTask(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<PreparedTask> {
  if (!args.quality) args.quality = "2k";

  const prompt = (await loadPromptForArgs(args)) ?? (await readPromptFromStdin());
  if (!prompt) throw new Error("Prompt is required");
  if (!args.imagePath) throw new Error("--image is required");
  if (args.referenceImages.length > 0) await validateReferenceImages(args.referenceImages);

  const provider = detectProvider(args);
  const providerModule = await loadProviderModule(provider);
  const model = getModelForProvider(provider, args.model, extendConfig, providerModule);
  providerModule.validateArgs?.(model, args);
  const defaultOutputExtension = providerModule.getDefaultOutputExtension?.(model, args) ?? ".png";

  return {
    id: "single",
    prompt,
    args,
    provider,
    model,
    outputPath: normalizeOutputImagePath(args.imagePath, defaultOutputExtension),
    providerModule,
  };
}

export async function loadBatchTasks(batchFilePath: string): Promise<LoadedBatchTasks> {
  const resolvedBatchFilePath = path.resolve(batchFilePath);
  const content = await readFile(resolvedBatchFilePath, "utf8");
  const parsed = JSON.parse(content.replace(/^\uFEFF/, "")) as BatchFile;
  const batchDir = path.dirname(resolvedBatchFilePath);
  if (Array.isArray(parsed)) {
    return {
      tasks: parsed,
      jobs: null,
      batchDir,
    };
  }
  if (parsed && typeof parsed === "object" && Array.isArray(parsed.tasks)) {
    const jobs = parsePositiveBatchInt(parsed.jobs);
    if (parsed.jobs !== undefined && parsed.jobs !== null && jobs === null) {
      throw new Error("Invalid batch file. jobs must be a positive integer when provided.");
    }
    return {
      tasks: parsed.tasks,
      jobs,
      batchDir,
    };
  }
  throw new Error("Invalid batch file. Expected an array of tasks or an object with a tasks array.");
}

export function resolveBatchPath(batchDir: string, filePath: string): string {
  return path.isAbsolute(filePath) ? filePath : path.resolve(batchDir, filePath);
}

export function createTaskArgs(baseArgs: CliArgs, task: BatchTaskInput, batchDir: string): CliArgs {
  return {
    ...baseArgs,
    prompt: task.prompt ?? null,
    promptFiles: task.promptFiles ? task.promptFiles.map((filePath) => resolveBatchPath(batchDir, filePath)) : [],
    imagePath: task.image ? resolveBatchPath(batchDir, task.image) : null,
    provider: task.provider ?? baseArgs.provider ?? null,
    model: task.model ?? baseArgs.model ?? null,
    aspectRatio: task.ar ?? baseArgs.aspectRatio ?? null,
    size: task.size ?? baseArgs.size ?? null,
    quality: task.quality ?? baseArgs.quality ?? null,
    imageSize: task.imageSize ?? baseArgs.imageSize ?? null,
    referenceImages: task.ref ? task.ref.map((filePath) => resolveBatchPath(batchDir, filePath)) : [],
    n: task.n ?? baseArgs.n,
    batchFile: null,
    jobs: baseArgs.jobs,
    json: baseArgs.json,
    help: false,
  };
}

async function prepareBatchTasks(
  args: CliArgs,
  extendConfig: Partial<ExtendConfig>
): Promise<{ tasks: PreparedTask[]; jobs: number | null }> {
  if (!args.batchFile) throw new Error("--batchfile is required in batch mode");
  const { tasks: taskInputs, jobs: batchJobs, batchDir } = await loadBatchTasks(args.batchFile);
  if (taskInputs.length === 0) throw new Error("Batch file does not contain any tasks.");

  const prepared: PreparedTask[] = [];
  for (let i = 0; i < taskInputs.length; i++) {
    const task = taskInputs[i]!;
    const taskArgs = createTaskArgs(args, task, batchDir);
    const prompt = await loadPromptForArgs(taskArgs);
    if (!prompt) throw new Error(`Task ${i + 1} is missing prompt or promptFiles.`);
    if (!taskArgs.imagePath) throw new Error(`Task ${i + 1} is missing image output path.`);
    if (taskArgs.referenceImages.length > 0) await validateReferenceImages(taskArgs.referenceImages);

    const provider = detectProvider(taskArgs);
    const providerModule = await loadProviderModule(provider);
    const model = getModelForProvider(provider, taskArgs.model, extendConfig, providerModule);
    providerModule.validateArgs?.(model, taskArgs);
    const defaultOutputExtension = providerModule.getDefaultOutputExtension?.(model, taskArgs) ?? ".png";
    prepared.push({
      id: task.id || `task-${String(i + 1).padStart(2, "0")}`,
      prompt,
      args: taskArgs,
      provider,
      model,
      outputPath: normalizeOutputImagePath(taskArgs.imagePath, defaultOutputExtension),
      providerModule,
    });
  }

  return {
    tasks: prepared,
    jobs: args.jobs ?? batchJobs,
  };
}

async function writeImage(outputPath: string, imageData: Uint8Array): Promise<void> {
  await mkdir(path.dirname(outputPath), { recursive: true });
  await writeFile(outputPath, imageData);
}

async function generatePreparedTask(task: PreparedTask): Promise<TaskResult> {
  console.error(`Using ${task.provider} / ${task.model} for ${task.id}`);
  console.error(
    `Switch model: --model <id> | EXTEND.md default_model.${task.provider} | env ${task.provider.toUpperCase()}_IMAGE_MODEL`
  );

  let attempts = 0;
  while (attempts < MAX_ATTEMPTS) {
    attempts += 1;
    try {
      const imageData = await task.providerModule.generateImage(task.prompt, task.model, task.args);
      await writeImage(task.outputPath, imageData);
      return {
        id: task.id,
        provider: task.provider,
        model: task.model,
        outputPath: task.outputPath,
        success: true,
        attempts,
        error: null,
      };
    } catch (error) {
      const message = error instanceof Error ? error.message : String(error);
      const canRetry = attempts < MAX_ATTEMPTS && isRetryableGenerationError(error);
      if (canRetry) {
        console.error(`[${task.id}] Attempt ${attempts}/${MAX_ATTEMPTS} failed, retrying...`);
        continue;
      }
      return {
        id: task.id,
        provider: task.provider,
        model: task.model,
        outputPath: task.outputPath,
        success: false,
        attempts,
        error: message,
      };
    }
  }

  return {
    id: task.id,
    provider: task.provider,
    model: task.model,
    outputPath: task.outputPath,
    success: false,
    attempts: MAX_ATTEMPTS,
    error: "Unknown failure",
  };
}

function createProviderGate(providerRateLimits: Record<Provider, ProviderRateLimit>) {
  const state = new Map<Provider, { active: number; lastStartedAt: number }>();

  return async function acquire(provider: Provider): Promise<() => void> {
    const limit = providerRateLimits[provider];
    while (true) {
      const current = state.get(provider) ?? { active: 0, lastStartedAt: 0 };
      const now = Date.now();
      const enoughCapacity = current.active < limit.concurrency;
      const enoughGap = now - current.lastStartedAt >= limit.startIntervalMs;
      if (enoughCapacity && enoughGap) {
        state.set(provider, { active: current.active + 1, lastStartedAt: now });
        return () => {
          const latest = state.get(provider) ?? { active: 1, lastStartedAt: now };
          state.set(provider, {
            active: Math.max(0, latest.active - 1),
            lastStartedAt: latest.lastStartedAt,
          });
        };
      }
      await new Promise((resolve) => setTimeout(resolve, POLL_WAIT_MS));
    }
  };
}

export function getWorkerCount(taskCount: number, jobs: number | null, maxWorkers: number): number {
  const requested = jobs ?? Math.min(taskCount, maxWorkers);
  return Math.max(1, Math.min(requested, taskCount, maxWorkers));
}

async function runBatchTasks(
  tasks: PreparedTask[],
  jobs: number | null,
  extendConfig: Partial<ExtendConfig>
): Promise<TaskResult[]> {
  if (tasks.length === 1) {
    return [await generatePreparedTask(tasks[0]!)];
  }

  const maxWorkers = getConfiguredMaxWorkers(extendConfig);
  const providerRateLimits = getConfiguredProviderRateLimits(extendConfig);
  const acquireProvider = createProviderGate(providerRateLimits);
  const workerCount = getWorkerCount(tasks.length, jobs, maxWorkers);
  console.error(`Batch mode: ${tasks.length} tasks, ${workerCount} workers, parallel mode enabled.`);
  for (const provider of ["replicate", "google", "openai", "openrouter", "dashscope", "jimeng", "seedream", "azure", "zai"] as Provider[]) {
    const limit = providerRateLimits[provider];
    console.error(`- ${provider}: concurrency=${limit.concurrency}, startIntervalMs=${limit.startIntervalMs}`);
  }

  let nextIndex = 0;
  const results: TaskResult[] = new Array(tasks.length);

  const worker = async (): Promise<void> => {
    while (true) {
      const currentIndex = nextIndex;
      nextIndex += 1;
      if (currentIndex >= tasks.length) return;

      const task = tasks[currentIndex]!;
      const release = await acquireProvider(task.provider);
      try {
        results[currentIndex] = await generatePreparedTask(task);
      } finally {
        release();
      }
    }
  };

  await Promise.all(Array.from({ length: workerCount }, () => worker()));
  return results;
}

function printBatchSummary(results: TaskResult[]): void {
  const successCount = results.filter((result) => result.success).length;
  const failureCount = results.length - successCount;

  console.error("");
  console.error("Batch generation summary:");
  console.error(`- Total: ${results.length}`);
  console.error(`- Succeeded: ${successCount}`);
  console.error(`- Failed: ${failureCount}`);

  if (failureCount > 0) {
    console.error("Failure reasons:");
    for (const result of results.filter((item) => !item.success)) {
      console.error(`- ${result.id}: ${result.error}`);
    }
  }
}

function emitJson(payload: unknown): void {
  console.log(JSON.stringify(payload, null, 2));
}

async function runSingleMode(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<void> {
  const task = await prepareSingleTask(args, extendConfig);
  const result = await generatePreparedTask(task);
  if (!result.success) {
    throw new Error(result.error || "Generation failed");
  }

  if (args.json) {
    emitJson({
      savedImage: result.outputPath,
      provider: result.provider,
      model: result.model,
      attempts: result.attempts,
      prompt: task.prompt.slice(0, 200),
    });
    return;
  }

  console.log(result.outputPath);
}

async function runBatchMode(args: CliArgs, extendConfig: Partial<ExtendConfig>): Promise<void> {
  const { tasks, jobs } = await prepareBatchTasks(args, extendConfig);
  const results = await runBatchTasks(tasks, jobs, extendConfig);
  printBatchSummary(results);

  if (args.json) {
    emitJson({
      mode: "batch",
      total: results.length,
      succeeded: results.filter((item) => item.success).length,
      failed: results.filter((item) => !item.success).length,
      results,
    });
  }

  if (results.some((item) => !item.success)) {
    process.exitCode = 1;
  }
}

async function main(): Promise<void> {
  const args = parseArgs(process.argv.slice(2));
  if (args.help) {
    printUsage();
    return;
  }

  await loadEnv();
  const extendConfig = await loadExtendConfig();
  const mergedArgs = mergeConfig(args, extendConfig);
  if (!mergedArgs.quality) mergedArgs.quality = "2k";

  if (mergedArgs.batchFile) {
    await runBatchMode(mergedArgs, extendConfig);
    return;
  }

  await runSingleMode(mergedArgs, extendConfig);
}

function isDirectExecution(metaUrl: string): boolean {
  const entryPath = process.argv[1];
  if (!entryPath) return false;

  try {
    return path.resolve(entryPath) === fileURLToPath(metaUrl);
  } catch {
    return false;
  }
}

if (isDirectExecution(import.meta.url)) {
  main().catch((error) => {
    const message = error instanceof Error ? error.message : String(error);
    console.error(message);
    process.exit(1);
  });
}

import assert from "node:assert/strict";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import test, { type TestContext } from "node:test";

import type { CliArgs } from "../types.ts";
import {
  generateImage,
  getDefaultModel,
  parseAzureBaseURL,
  validateArgs,
} from "./azure.ts";

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

function makeArgs(overrides: Partial<CliArgs> = {}): CliArgs {
  return {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
    ...overrides,
  };
}

async function makeTempDir(prefix: string): Promise<string> {
  return fs.mkdtemp(path.join(os.tmpdir(), prefix));
}

test("Azure endpoint parsing and default deployment selection follow env precedence", (t) => {
  assert.deepEqual(parseAzureBaseURL("https://example.openai.azure.com"), {
    resourceBaseURL: "https://example.openai.azure.com/openai",
    deployment: null,
  });
  assert.deepEqual(
    parseAzureBaseURL("https://example.openai.azure.com/openai/deployments/from-url"),
    {
      resourceBaseURL: "https://example.openai.azure.com/openai",
      deployment: "from-url",
    },
  );

  useEnv(t, {
    AZURE_OPENAI_BASE_URL: "https://example.openai.azure.com/openai/deployments/from-url",
    AZURE_OPENAI_DEPLOYMENT: "explicit-deploy",
    AZURE_OPENAI_IMAGE_MODEL: "env-fallback",
  });
  assert.equal(getDefaultModel(), "explicit-deploy");
});

test("Azure validateArgs rejects unsupported edit input formats before the API call", () => {
  assert.doesNotThrow(() =>
    validateArgs("demo-deployment", makeArgs({ referenceImages: ["hero.png", "photo.jpeg"] })),
  );
  assert.throws(
    () => validateArgs("demo-deployment", makeArgs({ referenceImages: ["hero.webp"] })),
    /PNG or JPG\/JPEG/,
  );
});

test("Azure image generation routes model to deployment and sends mapped quality", async (t) => {
  useEnv(t, {
    AZURE_OPENAI_API_KEY: "azure-key",
    AZURE_OPENAI_BASE_URL: "https://example.openai.azure.com/openai/deployments/default-deploy",
    AZURE_API_VERSION: null,
    AZURE_OPENAI_DEPLOYMENT: null,
    AZURE_OPENAI_IMAGE_MODEL: null,
  });

  const originalFetch = globalThis.fetch;
  t.after(() => {
    globalThis.fetch = originalFetch;
  });

  const calls: Array<{ url: string; body: string }> = [];
  globalThis.fetch = async (input, init) => {
    calls.push({
      url: String(input),
      body: String(init?.body ?? ""),
    });
    return Response.json({
      data: [{ b64_json: Buffer.from("azure-image").toString("base64") }],
    });
  };

  const bytes = await generateImage(
    "A calm lake at sunset",
    "custom-deploy",
    makeArgs({ quality: "normal" }),
  );

  assert.equal(Buffer.from(bytes).toString("utf8"), "azure-image");
  assert.equal(
    calls[0]?.url,
    "https://example.openai.azure.com/openai/deployments/custom-deploy/images/generations?api-version=2025-04-01-preview",
  );

  const body = JSON.parse(calls[0]!.body) as Record<string, string>;
  assert.equal(body.quality, "medium");
  assert.equal(body.size, "1024x1024");
});

test("Azure image edits include quality in multipart requests", async (t) => {
  const root = await makeTempDir("baoyu-image-gen-azure-");
  t.after(() => fs.rm(root, { recursive: true, force: true }));

  const pngPath = path.join(root, "ref.png");
  const jpgPath = path.join(root, "ref.jpg");
  await fs.writeFile(pngPath, "png-bytes");
  await fs.writeFile(jpgPath, "jpg-bytes");

  useEnv(t, {
    AZURE_OPENAI_API_KEY: "azure-key",
    AZURE_OPENAI_BASE_URL: "https://example.openai.azure.com",
    AZURE_API_VERSION: "2025-04-01-preview",
    AZURE_OPENAI_DEPLOYMENT: null,
    AZURE_OPENAI_IMAGE_MODEL: null,
  });

  const originalFetch = globalThis.fetch;
  t.after(() => {
    globalThis.fetch = originalFetch;
  });

  const calls: Array<{ url: string; form: FormData }> = [];
  globalThis.fetch = async (input, init) => {
    calls.push({
      url: String(input),
      form: init?.body as FormData,
    });
    return Response.json({
      data: [{ b64_json: Buffer.from("edited-image").toString("base64") }],
    });
  };

  const bytes = await generateImage(
    "Add warm lighting",
    "edit-deploy",
    makeArgs({
      quality: "2k",
      referenceImages: [pngPath, jpgPath],
    }),
  );

  assert.equal(Buffer.from(bytes).toString("utf8"), "edited-image");
  assert.equal(
    calls[0]?.url,
    "https://example.openai.azure.com/openai/deployments/edit-deploy/images/edits?api-version=2025-04-01-preview",
  );
  assert.equal(calls[0]?.form.get("quality"), "high");
  assert.equal(calls[0]?.form.get("size"), "1024x1024");
  assert.equal(calls[0]?.form.getAll("image[]").length, 2);
});

import path from "node:path";
import { readFile } from "node:fs/promises";
import type { CliArgs } from "../types";
import { getOpenAISize, extractImageFromResponse } from "./openai.ts";

type OpenAIImageResponse = { data: Array<{ url?: string; b64_json?: string }> };
type AzureEndpoint = {
  resourceBaseURL: string;
  deployment: string | null;
};

const DEFAULT_AZURE_API_VERSION = "2025-04-01-preview";
const AZURE_EDIT_IMAGE_EXTENSIONS = new Set([".png", ".jpg", ".jpeg"]);

export function parseAzureBaseURL(url: string): AzureEndpoint {
  const parsed = new URL(url);
  const trimmedPath = parsed.pathname.replace(/\/+$/, "");
  const deploymentMatch = trimmedPath.match(/^(.*?)(?:\/openai)?\/deployments\/([^/]+)$/);

  if (deploymentMatch) {
    parsed.pathname = `${deploymentMatch[1] || ""}/openai`;
    return {
      resourceBaseURL: parsed.toString().replace(/\/+$/, ""),
      deployment: decodeURIComponent(deploymentMatch[2]!),
    };
  }

  parsed.pathname = trimmedPath.endsWith("/openai") ? trimmedPath : `${trimmedPath}/openai`;
  return {
    resourceBaseURL: parsed.toString().replace(/\/+$/, ""),
    deployment: null,
  };
}

export function getDefaultModel(): string {
  const explicitDeployment = process.env.AZURE_OPENAI_DEPLOYMENT?.trim();
  if (explicitDeployment) return explicitDeployment;

  const baseURL = process.env.AZURE_OPENAI_BASE_URL;
  if (baseURL) {
    try {
      const { deployment } = parseAzureBaseURL(baseURL);
      if (deployment) return deployment;
    } catch {
      // Ignore invalid URLs here so the required-env check can raise the user-facing error later.
    }
  }

  return process.env.AZURE_OPENAI_IMAGE_MODEL || "gpt-image-1.5";
}

function getEndpoint(): AzureEndpoint {
  const url = process.env.AZURE_OPENAI_BASE_URL;
  if (!url) {
    throw new Error(
      "AZURE_OPENAI_BASE_URL is required. Set it to your Azure resource or deployment endpoint, e.g.: https://your-resource.openai.azure.com or https://your-resource.openai.azure.com/openai/deployments/your-deployment"
    );
  }
  return parseAzureBaseURL(url);
}

function getApiKey(): string {
  const key = process.env.AZURE_OPENAI_API_KEY;
  if (!key) {
    throw new Error(
      "AZURE_OPENAI_API_KEY is required. Get it from Azure Portal → your OpenAI resource → Keys and Endpoint."
    );
  }
  return key;
}

function getApiVersion(): string {
  return process.env.AZURE_API_VERSION || DEFAULT_AZURE_API_VERSION;
}

function getDeployment(model: string): string {
  const deployment = model.trim();
  if (!deployment) {
    throw new Error(
      "Azure deployment name is required. Use --model <deployment>, AZURE_OPENAI_DEPLOYMENT, AZURE_OPENAI_IMAGE_MODEL, or embed the deployment in AZURE_OPENAI_BASE_URL."
    );
  }
  return deployment;
}

function buildURL(deployment: string, pathSuffix: string): string {
  const { resourceBaseURL } = getEndpoint();
  return `${resourceBaseURL}/deployments/${encodeURIComponent(deployment)}${pathSuffix}?api-version=${getApiVersion()}`;
}

function authHeaders(): Record<string, string> {
  return { "api-key": getApiKey() };
}

function getAzureQuality(quality: CliArgs["quality"]): "medium" | "high" {
  return quality === "2k" ? "high" : "medium";
}

export function validateArgs(_model: string, args: CliArgs): void {
  for (const refPath of args.referenceImages) {
    const ext = path.extname(refPath).toLowerCase();
    if (!AZURE_EDIT_IMAGE_EXTENSIONS.has(ext)) {
      throw new Error(
        `Azure OpenAI reference images must be PNG or JPG/JPEG. Unsupported file: ${refPath}`
      );
    }
  }
}

export async function generateImage(
  prompt: string,
  model: string,
  args: CliArgs
): Promise<Uint8Array> {
  const deployment = getDeployment(model);
  const size = args.size || getOpenAISize(model, args.aspectRatio, args.quality);

  if (args.referenceImages.length > 0) {
    return generateWithAzureEdits(prompt, deployment, size, args.referenceImages, args.quality);
  }

  return generateWithAzureGenerations(prompt, deployment, size, args.quality);
}

async function generateWithAzureGenerations(
  prompt: string,
  deployment: string,
  size: string,
  quality: CliArgs["quality"]
): Promise<Uint8Array> {
  const body: Record<string, any> = {
    prompt,
    size,
    n: 1,
    quality: getAzureQuality(quality),
  };

  const res = await fetch(buildURL(deployment, "/images/generations"), {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      ...authHeaders(),
    },
    body: JSON.stringify(body),
  });

  if (!res.ok) {
    const err = await res.text();
    throw new Error(`Azure OpenAI API error: ${err}`);
  }

  const result = (await res.json()) as OpenAIImageResponse;
  return extractImageFromResponse(result);
}

async function generateWithAzureEdits(
  prompt: string,
  deployment: string,
  size: string,
  referenceImages: string[],
  quality: CliArgs["quality"]
): Promise<Uint8Array> {
  const form = new FormData();
  form.append("prompt", prompt);
  form.append("size", size);
  form.append("n", "1");
  form.append("quality", getAzureQuality(quality));

  for (const refPath of referenceImages) {
    const bytes = await readFile(refPath);
    const filename = path.basename(refPath);
    const mimeType = path.extname(filename).toLowerCase() === ".png" ? "image/png" : "image/jpeg";
    const blob = new Blob([bytes], { type: mimeType });
    form.append("image[]", blob, filename);
  }

  const res = await fetch(buildURL(deployment, "/images/edits"), {
    method: "POST",
    headers: {
      ...authHeaders(),
    },
    body: form,
  });

  if (!res.ok) {
    const err = await res.text();
    throw new Error(`Azure OpenAI edits API error: ${err}`);
  }

  const result = (await res.json()) as OpenAIImageResponse;
  return extractImageFromResponse(result);
}

import assert from "node:assert/strict";
import test, { type TestContext } from "node:test";

import {
  getDefaultModel,
  getModelFamily,
  getQwen2SizeFromAspectRatio,
  getSizeFromAspectRatio,
  normalizeSize,
  parseAspectRatio,
  parseSize,
  resolveSizeForModel,
} from "./dashscope.ts";

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

test("DashScope default model prefers env override and otherwise uses qwen-image-2.0-pro", (t) => {
  useEnv(t, { DASHSCOPE_IMAGE_MODEL: null });
  assert.equal(getDefaultModel(), "qwen-image-2.0-pro");

  process.env.DASHSCOPE_IMAGE_MODEL = "qwen-image-max";
  assert.equal(getDefaultModel(), "qwen-image-max");
});

test("DashScope aspect-ratio parsing accepts numeric ratios only", () => {
  assert.deepEqual(parseAspectRatio("3:2"), { width: 3, height: 2 });
  assert.equal(parseAspectRatio("square"), null);
  assert.equal(parseAspectRatio("-1:2"), null);
});

test("DashScope model family routing distinguishes qwen-2.0, fixed-size qwen, and legacy models", () => {
  assert.equal(getModelFamily("qwen-image-2.0-pro"), "qwen2");
  assert.equal(getModelFamily("qwen-image"), "qwenFixed");
  assert.equal(getModelFamily("z-image-turbo"), "legacy");
  assert.equal(getModelFamily("wanx-v1"), "legacy");
});

test("Legacy DashScope size selection keeps the previous quality-based heuristic", () => {
  assert.equal(getSizeFromAspectRatio(null, "normal"), "1024*1024");
  assert.equal(getSizeFromAspectRatio("16:9", "normal"), "1280*720");
  assert.equal(getSizeFromAspectRatio("16:9", "2k"), "2048*1152");
  assert.equal(getSizeFromAspectRatio("invalid", "2k"), "1536*1536");
});

test("Qwen 2.0 recommended sizes follow the official common-ratio table", () => {
  assert.equal(getQwen2SizeFromAspectRatio(null, "normal"), "1024*1024");
  assert.equal(getQwen2SizeFromAspectRatio(null, "2k"), "1536*1536");
  assert.equal(getQwen2SizeFromAspectRatio("16:9", "normal"), "1280*720");
  assert.equal(getQwen2SizeFromAspectRatio("21:9", "2k"), "2048*872");
});

test("Qwen 2.0 derives free-form sizes within pixel budget for uncommon ratios", () => {
  const size = getQwen2SizeFromAspectRatio("5:2", "normal");
  const parsed = parseSize(size);
  assert.ok(parsed);
  assert.ok(parsed.width * parsed.height >= 512 * 512);
  assert.ok(parsed.width * parsed.height <= 2048 * 2048);
  assert.ok(Math.abs(parsed.width / parsed.height - 2.5) < 0.08);
});

test("resolveSizeForModel validates explicit qwen-image-2.0 sizes by total pixels", () => {
  assert.equal(
    resolveSizeForModel("qwen-image-2.0-pro", {
      size: "2048x872",
      aspectRatio: null,
      quality: "2k",
    }),
    "2048*872",
  );

  assert.throws(
    () =>
      resolveSizeForModel("qwen-image-2.0-pro", {
        size: "4096x4096",
        aspectRatio: null,
        quality: "2k",
      }),
    /total pixels between/,
  );
});

test("resolveSizeForModel enforces fixed sizes for qwen-image-max/plus/image", () => {
  assert.equal(
    resolveSizeForModel("qwen-image-max", {
      size: null,
      aspectRatio: "1:1",
      quality: "2k",
    }),
    "1328*1328",
  );

  assert.equal(
    resolveSizeForModel("qwen-image", {
      size: "1664x928",
      aspectRatio: "9:16",
      quality: "normal",
    }),
    "1664*928",
  );

  assert.throws(
    () =>
      resolveSizeForModel("qwen-image-max", {
        size: null,
        aspectRatio: "21:9",
        quality: "2k",
      }),
    /supports only fixed ratios/,
  );

  assert.throws(
    () =>
      resolveSizeForModel("qwen-image-plus", {
        size: "1024x1024",
        aspectRatio: null,
        quality: "2k",
      }),
    /support only these sizes/,
  );
});

test("DashScope size normalization converts WxH into provider format", () => {
  assert.equal(normalizeSize("1024x1024"), "1024*1024");
  assert.equal(normalizeSize("2048*1152"), "2048*1152");
});

import type { CliArgs, Quality } from "../types";

type DashScopeModelFamily = "qwen2" | "qwenFixed" | "legacy";

type DashScopeModelSpec = {
  family: DashScopeModelFamily;
  defaultSize: string;
};

const DEFAULT_MODEL = "qwen-image-2.0-pro";
const MIN_QWEN_2_TOTAL_PIXELS = 512 * 512;
const MAX_QWEN_2_TOTAL_PIXELS = 2048 * 2048;
const SIZE_STEP = 16;
const QWEN_NEGATIVE_PROMPT =
  "低分辨率，低画质，肢体畸形，手指畸形，画面过饱和，蜡像感，人脸无细节，过度光滑，画面具有AI感，构图混乱，文字模糊，扭曲";

const QWEN_2_TARGET_PIXELS: Record<Quality, number> = {
  normal: 1024 * 1024,
  "2k": 1536 * 1536,
};

const QWEN_2_RECOMMENDED: Record<string, Record<Quality, string>> = {
  "1:1": { normal: "1024*1024", "2k": "1536*1536" },
  "2:3": { normal: "768*1152", "2k": "1024*1536" },
  "3:2": { normal: "1152*768", "2k": "1536*1024" },
  "3:4": { normal: "960*1280", "2k": "1080*1440" },
  "4:3": { normal: "1280*960", "2k": "1440*1080" },
  "9:16": { normal: "720*1280", "2k": "1080*1920" },
  "16:9": { normal: "1280*720", "2k": "1920*1080" },
  "21:9": { normal: "1344*576", "2k": "2048*872" },
};

const QWEN_FIXED_SIZES_BY_RATIO: Record<string, string> = {
  "16:9": "1664*928",
  "4:3": "1472*1104",
  "1:1": "1328*1328",
  "3:4": "1104*1472",
  "9:16": "928*1664",
};

const QWEN_FIXED_SIZES = Object.values(QWEN_FIXED_SIZES_BY_RATIO);

const LEGACY_STANDARD_SIZES: [number, number][] = [
  [1024, 1024],
  [1280, 720],
  [720, 1280],
  [1024, 768],
  [768, 1024],
  [1536, 1024],
  [1024, 1536],
  [1536, 864],
  [864, 1536],
];

const LEGACY_STANDARD_SIZES_2K: [number, number][] = [
  [1536, 1536],
  [2048, 1152],
  [1152, 2048],
  [1536, 1024],
  [1024, 1536],
  [1536, 864],
  [864, 1536],
  [2048, 2048],
];

const QWEN_2_SPEC: DashScopeModelSpec = {
  family: "qwen2",
  defaultSize: "1024*1024",
};

const QWEN_FIXED_SPEC: DashScopeModelSpec = {
  family: "qwenFixed",
  defaultSize: QWEN_FIXED_SIZES_BY_RATIO["16:9"],
};

const LEGACY_SPEC: DashScopeModelSpec = {
  family: "legacy",
  defaultSize: "1536*1536",
};

const MODEL_SPEC_ALIASES: Record<string, DashScopeModelSpec> = {
  "qwen-image-2.0-pro": QWEN_2_SPEC,
  "qwen-image-2.0-pro-2026-03-03": QWEN_2_SPEC,
  "qwen-image-2.0": QWEN_2_SPEC,
  "qwen-image-2.0-2026-03-03": QWEN_2_SPEC,
  "qwen-image-max": QWEN_FIXED_SPEC,
  "qwen-image-max-2025-12-30": QWEN_FIXED_SPEC,
  "qwen-image-plus": QWEN_FIXED_SPEC,
  "qwen-image-plus-2026-01-09": QWEN_FIXED_SPEC,
  "qwen-image": QWEN_FIXED_SPEC,
};

export function getDefaultModel(): string {
  return process.env.DASHSCOPE_IMAGE_MODEL || DEFAULT_MODEL;
}

function getApiKey(): string | null {
  return process.env.DASHSCOPE_API_KEY || null;
}

function getBaseUrl(): string {
  const base = process.env.DASHSCOPE_BASE_URL || "https://dashscope.aliyuncs.com";
  return base.replace(/\/+$/g, "");
}

function getModelSpec(model: string): DashScopeModelSpec {
  return MODEL_SPEC_ALIASES[model.trim().toLowerCase()] || LEGACY_SPEC;
}

export function getModelFamily(model: string): DashScopeModelFamily {
  return getModelSpec(model).family;
}

function normalizeQuality(quality: CliArgs["quality"]): Quality {
  return quality === "normal" ? "normal" : "2k";
}

export function parseAspectRatio(ar: string): { width: number; height: number } | null {
  const match = ar.match(/^(\d+(?:\.\d+)?):(\d+(?:\.\d+)?)$/);
  if (!match) return null;
  const w = parseFloat(match[1]!);
  const h = parseFloat(match[2]!);
  if (w <= 0 || h <= 0) return null;
  return { width: w, height: h };
}

export function normalizeSize(size: string): string {
  return size.replace("x", "*");
}

export function parseSize(size: string): { width: number; height: number } | null {
  const match = normalizeSize(size).match(/^(\d+)\*(\d+)$/);
  if (!match) return null;
  const width = Number(match[1]);
  const height = Number(match[2]);
  if (!Number.isFinite(width) || !Number.isFinite(height) || width <= 0 || height <= 0) {
    return null;
  }
  return { width, height };
}

function formatSize(width: number, height: number): string {
  return `${width}*${height}`;
}

function getRatioValue(ar: string): number | null {
  const parsed = parseAspectRatio(ar);
  if (!parsed) return null;
  return parsed.width / parsed.height;
}

function findKnownRatioKey(ar: string, candidates: string[], tolerance = 0.02): string | null {
  const targetRatio = getRatioValue(ar);
  if (targetRatio == null) return null;

  let bestKey: string | null = null;
  let bestDiff = Infinity;

  for (const candidate of candidates) {
    const candidateRatio = getRatioValue(candidate);
    if (candidateRatio == null) continue;
    const diff = Math.abs(candidateRatio - targetRatio);
    if (diff < bestDiff) {
      bestDiff = diff;
      bestKey = candidate;
    }
  }

  return bestDiff <= tolerance ? bestKey : null;
}

function roundToStep(value: number): number {
  return Math.max(SIZE_STEP, Math.round(value / SIZE_STEP) * SIZE_STEP);
}

function fitToPixelBudget(
  width: number,
  height: number,
  minPixels: number,
  maxPixels: number,
): { width: number; height: number } {
  let nextWidth = width;
  let nextHeight = height;
  let pixels = nextWidth * nextHeight;

  if (pixels > maxPixels) {
    const scale = Math.sqrt(maxPixels / pixels);
    nextWidth *= scale;
    nextHeight *= scale;
  } else if (pixels < minPixels) {
    const scale = Math.sqrt(minPixels / pixels);
    nextWidth *= scale;
    nextHeight *= scale;
  }

  let roundedWidth = roundToStep(nextWidth);
  let roundedHeight = roundToStep(nextHeight);
  pixels = roundedWidth * roundedHeight;

  while (pixels > maxPixels && (roundedWidth > SIZE_STEP || roundedHeight > SIZE_STEP)) {
    if (roundedWidth >= roundedHeight && roundedWidth > SIZE_STEP) {
      roundedWidth -= SIZE_STEP;
    } else if (roundedHeight > SIZE_STEP) {
      roundedHeight -= SIZE_STEP;
    } else {
      break;
    }
    pixels = roundedWidth * roundedHeight;
  }

  while (pixels < minPixels) {
    if (roundedWidth <= roundedHeight) {
      roundedWidth += SIZE_STEP;
    } else {
      roundedHeight += SIZE_STEP;
    }
    pixels = roundedWidth * roundedHeight;
  }

  return { width: roundedWidth, height: roundedHeight };
}

export function getSizeFromAspectRatio(ar: string | null, quality: CliArgs["quality"]): string {
  const normalizedQuality = normalizeQuality(quality);
  const sizes = normalizedQuality === "2k" ? LEGACY_STANDARD_SIZES_2K : LEGACY_STANDARD_SIZES;
  const defaultSize = normalizedQuality === "2k" ? "1536*1536" : "1024*1024";

  if (!ar) return defaultSize;

  const parsed = parseAspectRatio(ar);
  if (!parsed) return defaultSize;

  const targetRatio = parsed.width / parsed.height;
  let best = defaultSize;
  let bestDiff = Infinity;

  for (const [width, height] of sizes) {
    const diff = Math.abs(width / height - targetRatio);
    if (diff < bestDiff) {
      bestDiff = diff;
      best = formatSize(width, height);
    }
  }

  return best;
}

export function getQwen2SizeFromAspectRatio(ar: string | null, quality: CliArgs["quality"]): string {
  const normalizedQuality = normalizeQuality(quality);

  if (!ar) {
    return QWEN_2_RECOMMENDED["1:1"][normalizedQuality];
  }

  const recommendedRatio = findKnownRatioKey(ar, Object.keys(QWEN_2_RECOMMENDED));
  if (recommendedRatio) {
    return QWEN_2_RECOMMENDED[recommendedRatio][normalizedQuality];
  }

  const parsed = parseAspectRatio(ar);
  if (!parsed) {
    return QWEN_2_RECOMMENDED["1:1"][normalizedQuality];
  }

  const targetRatio = parsed.width / parsed.height;
  const targetPixels = QWEN_2_TARGET_PIXELS[normalizedQuality];
  const rawWidth = Math.sqrt(targetPixels * targetRatio);
  const rawHeight = Math.sqrt(targetPixels / targetRatio);
  const fitted = fitToPixelBudget(
    rawWidth,
    rawHeight,
    MIN_QWEN_2_TOTAL_PIXELS,
    MAX_QWEN_2_TOTAL_PIXELS,
  );

  return formatSize(fitted.width, fitted.height);
}

function getQwenFixedSizeFromAspectRatio(ar: string | null, quality: CliArgs["quality"]): string {
  if (quality === "normal") {
    console.warn(
      "DashScope qwen-image-max/plus/image models use fixed output sizes; --quality normal does not change the generated resolution."
    );
  }

  if (!ar) return QWEN_FIXED_SPEC.defaultSize;

  const ratioKey = findKnownRatioKey(ar, Object.keys(QWEN_FIXED_SIZES_BY_RATIO));
  if (!ratioKey) {
    throw new Error(
      `DashScope model supports only fixed ratios ${Object.keys(QWEN_FIXED_SIZES_BY_RATIO).join(", ")}. ` +
      `For custom ratios like "${ar}", use --model qwen-image-2.0-pro.`
    );
  }

  return QWEN_FIXED_SIZES_BY_RATIO[ratioKey]!;
}

function validateSizeFormat(size: string): { width: number; height: number } {
  const parsed = parseSize(size);
  if (!parsed) {
    throw new Error(`Invalid DashScope size "${size}". Expected <width>x<height> or <width>*<height>.`);
  }
  return parsed;
}

function validateQwen2Size(size: string): string {
  const normalized = normalizeSize(size);
  const parsed = validateSizeFormat(normalized);
  const totalPixels = parsed.width * parsed.height;
  if (totalPixels < MIN_QWEN_2_TOTAL_PIXELS || totalPixels > MAX_QWEN_2_TOTAL_PIXELS) {
    throw new Error(
      `DashScope qwen-image-2.0* models require total pixels between ${MIN_QWEN_2_TOTAL_PIXELS} ` +
      `and ${MAX_QWEN_2_TOTAL_PIXELS}. Received ${normalized} (${totalPixels} pixels).`
    );
  }
  return normalized;
}

function validateQwenFixedSize(size: string): string {
  const normalized = normalizeSize(size);
  validateSizeFormat(normalized);
  if (!QWEN_FIXED_SIZES.includes(normalized)) {
    throw new Error(
      `DashScope qwen-image-max/plus/image models support only these sizes: ${QWEN_FIXED_SIZES.join(", ")}. ` +
      `Received ${normalized}.`
    );
  }
  return normalized;
}

export function resolveSizeForModel(
  model: string,
  args: Pick<CliArgs, "size" | "aspectRatio" | "quality">,
): string {
  const spec = getModelSpec(model);

  if (args.size) {
    if (spec.family === "qwen2") return validateQwen2Size(args.size);
    if (spec.family === "qwenFixed") return validateQwenFixedSize(args.size);
    validateSizeFormat(args.size);
    return normalizeSize(args.size);
  }

  if (spec.family === "qwen2") {
    return getQwen2SizeFromAspectRatio(args.aspectRatio, args.quality);
  }

  if (spec.family === "qwenFixed") {
    return getQwenFixedSizeFromAspectRatio(args.aspectRatio, args.quality);
  }

  return getSizeFromAspectRatio(args.aspectRatio, args.quality);
}

function buildParameters(
  family: DashScopeModelFamily,
  size: string,
): Record<string, unknown> {
  const parameters: Record<string, unknown> = {
    prompt_extend: false,
    size,
  };

  if (family === "qwen2" || family === "qwenFixed") {
    parameters.watermark = false;
    parameters.negative_prompt = QWEN_NEGATIVE_PROMPT;
  }

  return parameters;
}

type DashScopeResponse = {
  output?: {
    result_image?: string;
    choices?: Array<{
      message?: {
        content?: Array<{ image?: string }>;
      };
    }>;
  };
};

async function extractImageFromResponse(result: DashScopeResponse): Promise<Uint8Array> {
  let imageData: string | null = null;

  if (result.output?.result_image) {
    imageData = result.output.result_image;
  } else if (result.output?.choices?.[0]?.message?.content) {
    const content = result.output.choices[0].message.content;
    for (const item of content) {
      if (item.image) {
        imageData = item.image;
        break;
      }
    }
  }

  if (!imageData) {
    console.error("Response:", JSON.stringify(result, null, 2));
    throw new Error("No image in response");
  }

  if (imageData.startsWith("http://") || imageData.startsWith("https://")) {
    const imgRes = await fetch(imageData);
    if (!imgRes.ok) throw new Error("Failed to download image");
    const buf = await imgRes.arrayBuffer();
    return new Uint8Array(buf);
  }

  return Uint8Array.from(Buffer.from(imageData, "base64"));
}

export async function generateImage(
  prompt: string,
  model: string,
  args: CliArgs
): Promise<Uint8Array> {
  const apiKey = getApiKey();
  if (!apiKey) throw new Error("DASHSCOPE_API_KEY is required");

  if (args.referenceImages.length > 0) {
    throw new Error(
      "Reference images are not supported with DashScope provider in baoyu-image-gen. Use --provider google with a Gemini multimodal model."
    );
  }

  const spec = getModelSpec(model);
  const size = resolveSizeForModel(model, args);
  const url = `${getBaseUrl()}/api/v1/services/aigc/multimodal-generation/generation`;

  const body = {
    model,
    input: {
      messages: [
        {
          role: "user",
          content: [{ text: prompt }],
        },
      ],
    },
    parameters: buildParameters(spec.family, size),
  };

  console.log(`Generating image with DashScope (${model})...`, { family: spec.family, size });

  const res = await fetch(url, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify(body),
  });

  if (!res.ok) {
    const err = await res.text();
    throw new Error(`DashScope API error (${res.status}): ${err}`);
  }

  const result = await res.json() as DashScopeResponse;
  return extractImageFromResponse(result);
}

import assert from "node:assert/strict";
import test, { type TestContext } from "node:test";

import type { CliArgs } from "../types.ts";
import {
  addAspectRatioToPrompt,
  buildGoogleUrl,
  buildPromptWithAspect,
  extractInlineImageData,
  extractPredictedImageData,
  getGoogleImageSize,
  isGoogleImagen,
  isGoogleMultimodal,
  normalizeGoogleModelId,
} from "./google.ts";

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

function makeArgs(overrides: Partial<CliArgs> = {}): CliArgs {
  return {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
    ...overrides,
  };
}

test("Google provider helpers normalize model IDs and select image size defaults", () => {
  assert.equal(
    normalizeGoogleModelId("models/gemini-3.1-flash-image-preview"),
    "gemini-3.1-flash-image-preview",
  );
  assert.equal(isGoogleMultimodal("models/gemini-3-pro-image-preview"), true);
  assert.equal(isGoogleImagen("imagen-3.0-generate-002"), true);
  assert.equal(getGoogleImageSize(makeArgs({ imageSize: null, quality: "2k" })), "2K");
  assert.equal(getGoogleImageSize(makeArgs({ imageSize: "4K", quality: "normal" })), "4K");
});

test("Google URL builder appends v1beta when the base URL does not already include it", (t) => {
  useEnv(t, { GOOGLE_BASE_URL: "https://generativelanguage.googleapis.com" });
  assert.equal(
    buildGoogleUrl("models/demo:generateContent"),
    "https://generativelanguage.googleapis.com/v1beta/models/demo:generateContent",
  );
});

test("Google URL and prompt helpers preserve existing v1beta paths and aspect hints", (t) => {
  useEnv(t, { GOOGLE_BASE_URL: "https://example.com/custom/v1beta/" });
  assert.equal(
    buildGoogleUrl("/models/demo:predict"),
    "https://example.com/custom/v1beta/models/demo:predict",
  );

  assert.equal(
    addAspectRatioToPrompt("A city skyline", "16:9"),
    "A city skyline Aspect ratio: 16:9.",
  );
  assert.equal(
    buildPromptWithAspect("A city skyline", "16:9", "2k"),
    "A city skyline Aspect ratio: 16:9. High resolution 2048px.",
  );
});

test("Google response extractors find inline and predicted image payloads", () => {
  assert.equal(
    extractInlineImageData({
      candidates: [
        {
          content: {
            parts: [{ inlineData: { data: "inline-base64" } }],
          },
        },
      ],
    }),
    "inline-base64",
  );

  assert.equal(
    extractPredictedImageData({
      predictions: [{ image: { imageBytes: "predicted-base64" } }],
    }),
    "predicted-base64",
  );

  assert.equal(
    extractPredictedImageData({
      generatedImages: [{ bytesBase64Encoded: "generated-base64" }],
    }),
    "generated-base64",
  );
});

import assert from "node:assert/strict";
import test, { type TestContext } from "node:test";

import type { CliArgs } from "../types.ts";
import { generateImage } from "./jimeng.ts";

function makeArgs(overrides: Partial<CliArgs> = {}): CliArgs {
  return {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
    ...overrides,
  };
}

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

test("Jimeng submit request uses prompt field expected by current API", async (t) => {
  useEnv(t, {
    JIMENG_ACCESS_KEY_ID: "test-access-key",
    JIMENG_SECRET_ACCESS_KEY: "test-secret-key",
    JIMENG_BASE_URL: null,
    JIMENG_REGION: null,
  });

  const originalFetch = globalThis.fetch;
  t.after(() => {
    globalThis.fetch = originalFetch;
  });

  const calls: Array<{
    input: string;
    init?: RequestInit;
  }> = [];

  globalThis.fetch = async (input, init) => {
    calls.push({
      input: String(input),
      init,
    });

    if (calls.length === 1) {
      return Response.json({
        code: 10000,
        data: {
          task_id: "task-123",
        },
      });
    }

    return Response.json({
      code: 10000,
      data: {
        status: "done",
        binary_data_base64: [Buffer.from("jimeng-image").toString("base64")],
      },
    });
  };

  const image = await generateImage(
    "A quiet bamboo forest",
    "jimeng_t2i_v40",
    makeArgs({ quality: "normal" }),
  );

  assert.equal(Buffer.from(image).toString("utf8"), "jimeng-image");
  assert.equal(calls.length, 2);
  assert.equal(
    calls[0]?.input,
    "https://visual.volcengineapi.com/?Action=CVSync2AsyncSubmitTask&Version=2022-08-31",
  );

  const submitBody = JSON.parse(String(calls[0]?.init?.body)) as Record<string, unknown>;
  assert.equal(submitBody.req_key, "jimeng_t2i_v40");
  assert.equal(submitBody.prompt, "A quiet bamboo forest");
  assert.ok(!("prompt_text" in submitBody));
  assert.equal(submitBody.width, 1024);
  assert.equal(submitBody.height, 1024);
});

import assert from "node:assert/strict";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import test, { type TestContext } from "node:test";

import type { CliArgs } from "../types.ts";
import {
  buildMinimaxUrl,
  buildRequestBody,
  buildSubjectReference,
  extractImageFromResponse,
  parsePixelSize,
  validateArgs,
} from "./minimax.ts";

function useEnv(
  t: TestContext,
  values: Record<string, string | null>,
): void {
  const previous = new Map<string, string | undefined>();
  for (const [key, value] of Object.entries(values)) {
    previous.set(key, process.env[key]);
    if (value == null) {
      delete process.env[key];
    } else {
      process.env[key] = value;
    }
  }

  t.after(() => {
    for (const [key, value] of previous.entries()) {
      if (value == null) {
        delete process.env[key];
      } else {
        process.env[key] = value;
      }
    }
  });
}

function makeArgs(overrides: Partial<CliArgs> = {}): CliArgs {
  return {
    prompt: null,
    promptFiles: [],
    imagePath: null,
    provider: null,
    model: null,
    aspectRatio: null,
    size: null,
    quality: null,
    imageSize: null,
    referenceImages: [],
    n: 1,
    batchFile: null,
    jobs: null,
    json: false,
    help: false,
    ...overrides,
  };
}

test("MiniMax URL builder uses documented default and normalizes /v1 suffixes", (t) => {
  useEnv(t, { MINIMAX_BASE_URL: null });
  assert.equal(buildMinimaxUrl(), "https://api.minimaxi.com/v1/image_generation");

  process.env.MINIMAX_BASE_URL = "https://api.minimax.io";
  assert.equal(buildMinimaxUrl(), "https://api.minimax.io/v1/image_generation");

  process.env.MINIMAX_BASE_URL = "https://proxy.example.com/custom/v1/";
  assert.equal(buildMinimaxUrl(), "https://proxy.example.com/custom/v1/image_generation");
});

test("MiniMax size parsing and validation follow documented constraints", () => {
  assert.deepEqual(parsePixelSize("1536x1024"), { width: 1536, height: 1024 });
  assert.deepEqual(parsePixelSize("1536*1024"), { width: 1536, height: 1024 });
  assert.equal(parsePixelSize("wide"), null);

  validateArgs("image-01", makeArgs({ size: "1536x1024", n: 9 }));

  assert.throws(
    () => validateArgs("image-01-live", makeArgs({ size: "1536x1024" })),
    /only supported with model image-01/,
  );
  assert.throws(
    () => validateArgs("image-01", makeArgs({ size: "1537x1024" })),
    /divisible by 8/,
  );
  assert.throws(
    () => validateArgs("image-01", makeArgs({ aspectRatio: "2.35:1" })),
    /aspect_ratio must be one of/,
  );
  assert.throws(
    () => validateArgs("image-01", makeArgs({ n: 10 })),
    /at most 9 images/,
  );
});

test("MiniMax request body maps aspect ratio, size, n, and subject references", async (t) => {
  const dir = await fs.mkdtemp(path.join(os.tmpdir(), "minimax-test-"));
  t.after(() => fs.rm(dir, { recursive: true, force: true }));

  const refPath = path.join(dir, "portrait.png");
  await fs.writeFile(refPath, Buffer.from("portrait"));

  const ratioBody = await buildRequestBody(
    "A portrait by the window",
    "image-01",
    makeArgs({ aspectRatio: "16:9", n: 2, referenceImages: [refPath] }),
  );
  assert.equal(ratioBody.aspect_ratio, "16:9");
  assert.equal(ratioBody.n, 2);
  assert.equal(ratioBody.response_format, "base64");
  assert.match(ratioBody.subject_reference?.[0]?.image_file || "", /^data:image\/png;base64,/);

  const sizeBody = await buildRequestBody(
    "A portrait by the window",
    "image-01",
    makeArgs({ size: "1536x1024" }),
  );
  assert.equal(sizeBody.width, 1536);
  assert.equal(sizeBody.height, 1024);
  assert.equal(sizeBody.aspect_ratio, undefined);
});

test("MiniMax subject references require supported file types", async (t) => {
  const dir = await fs.mkdtemp(path.join(os.tmpdir(), "minimax-ref-"));
  t.after(() => fs.rm(dir, { recursive: true, force: true }));

  const good = path.join(dir, "portrait.jpg");
  const bad = path.join(dir, "portrait.webp");
  await fs.writeFile(good, Buffer.from("portrait"));
  await fs.writeFile(bad, Buffer.from("portrait"));

  const subjectReference = await buildSubjectReference([good]);
  assert.equal(subjectReference?.[0]?.type, "character");

  await assert.rejects(
    () => buildSubjectReference([bad]),
    /only supports JPG, JPEG, or PNG/,
  );
});

test("MiniMax response extraction supports base64 and URL payloads", async (t) => {
  const originalFetch = globalThis.fetch;
  t.after(() => {
    globalThis.fetch = originalFetch;
  });

  const fromBase64 = await extractImageFromResponse({
    data: {
      image_base64: [Buffer.from("hello").toString("base64")],
    },
  });
  assert.equal(Buffer.from(fromBase64).toString("utf8"), "hello");

  globalThis.fetch = async () =>
    new Response(Uint8Array.from([1, 2, 3]), {
      status: 200,
      headers: { "Content-Type": "image/jpeg" },
    });

  const fromUrl = await extractImageFromResponse({
    data: {
      image_urls: ["https://example.com/output.jpg"],
    },
  });
  assert.deepEqual([...fromUrl], [1, 2, 3]);

  await assert.rejects(
    () => extractImageFromResponse({ base_resp: { status_code: 1001, status_msg: "blocked" } }),
    /blocked/,
  );
});

Related skills

Remotion Best PracticesGet Remotion-specific coding guidance that prevents common video rendering mistakes when creating animated React videos.442k4.1k

Remotion RenderGenerate high-quality MP4 videos from React code using Remotion inside an AI coding agent.363k648

Ai Video GenerationTurn written prompts into short videos using AI video generation models directly from Cursor or Claude.363k648

Ai Avatar VideoGenerate short talking-head videos of custom AI avatars from text prompts.363k648

Ai Image GenerationLet their coding agent generate, iterate on, and insert high-quality images directly into web apps, marketing assets, or product features.363k648

Video EditIntelligently route video editing requests to the best RunComfy model without trial-and-error.357k31

How it compares

Prefer baoyu-imagine as the successor; use baoyu-image-gen only when explicitly targeting the deprecated multi-provider script workflow.

FAQ

Which image APIs does baoyu-image-gen support?

baoyu-image-gen supports OpenAI, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream, and Replicate for text-to-image and reference-image generation with aspect ratio control.

Is baoyu-image-gen still the recommended skill?

baoyu-image-gen version 1.56.4 is marked deprecated in favor of baoyu-imagine. It still runs via bun or npx and supports batch parallel generation when multiple prompts are ready.

Is Baoyu Image Gen safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Generative Mediaautomation

About

Baoyu Image Gen by the numbers

baoyu-image-gen capabilities & compatibility

What baoyu-image-gen says it does

Add your badge

How do you generate images from a coding agent?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Image Generation (AI SDK)

User Input Tools

Script Directory

Step 0: Load Preferences ⛔ BLOCKING

Usage

Options

Environment Variables

Model Resolution

OpenAI-Compatible Gateway Dialects

Provider-Specific Guides

Provider Selection

Quality Presets

Aspect Ratios

Generation Mode

Error Handling

References

Extension Support

First-Time Setup

Overview

Setup Flow

Flow 1: No EXTEND.md (Full Setup)

Question 1: Default Provider

Question 2: Default Google Model

Question 2b: Default OpenRouter Model

Question 2c: Default Azure Deployment

Question 2d: Default MiniMax Model

Question 2e: Default Z.AI Model

Question 3: Default Quality

Question 4: Save Location

Save Locations

EXTEND.md Template

Flow 2: EXTEND.md Exists, Model Null

Google Model Selection

OpenAI Model Selection

Azure Deployment Selection

OpenRouter Model Selection

DashScope Model Selection

Replicate Model Selection

MiniMax Model Selection

Z.AI Model Selection

Update EXTEND.md

After Setup

Preferences Schema

Full Schema

Field Reference

Examples

DashScope (阿里通义万象)

Model Families

Size Resolution

Recommended qwen-image-2.0* sizes

Not Exposed

Official References

MiniMax

Models

Subject Reference

Official References

OpenRouter

Common Models

Behavior Notes

Replicate

Supported Families

Guardrails

Examples

Z.AI GLM-Image

Models

Behavior Notes

Official References

Usage Examples

Core Patterns

Recommended `qwen-image-2.0*` sizes