Gpt Image 2

Name: Gpt Image 2
Author: agentspace-so

agentspace-so/runcomfy-agent-skills

30.4k installs
31 repo stars
Updated May 15, 2026
agentspace-so/runcomfy-agent-skills

This is a copy of gpt-image-2 by doany-ai - installs and ranking accrue to the original listing.

gpt-image-2 is an agent skill that generates and edits images with OpenAI GPT Image 2 through the RunComfy CLI without requiring a direct OpenAI API key for developers building visual assets in agent workflows.

About

gpt-image-2 from agentspace-so/runcomfy-agent-skills routes image generation and editing to OpenAI GPT Image 2 (ChatGPT Images 2.0) via the local RunComfy CLI using runcomfy run openai/gpt-image-2/text-to-image or /edit. The skill documents GPT Image 2 strengths—embedded text, logos, multilingual typography, and instruction precision—and its three fixed output sizes plus edit-with-preservation prompting patterns. Developers reach for gwt-image-2 when triggers include gpt image 2, gpt-image-2, ChatGPT Images 2, or explicit generate-or-edit requests for this model. The skill also explains when to route to sibling models Flux 2, Nano Banana Pro, or Seedream instead. MIT-licensed and hosted on runcomfy.com, it suits agent pipelines that need branded visuals, UI mock imagery, or localized text-in-image assets without managing OpenAI API credentials directly. Requires RunComfy CLI installed locally.

Generates images with embedded text, logos, and multilingual typography
Supports 3 fixed output sizes with precise instruction following
Offers edit-with-preservation mode that maintains original elements
Routes intelligently to sibling models (Flux 2, Nano Banana Pro, Seedream) when better suited
Triggers on phrases like "gpt image 2", "ChatGPT Images 2", or explicit model requests

Gpt Image 2 by the numbers

30,438 all-time installs (skills.sh)
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill gpt-image-2

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/agentspace-so/runcomfy-agent-skills/gpt-image-2.svg)](https://skillselion.com/skills/agentspace-so/runcomfy-agent-skills/gpt-image-2)

Installs	30.4k
repo stars	★ 31
Security audit	2 / 3 scanners passed
Last updated	May 15, 2026
Repository	agentspace-so/runcomfy-agent-skills ↗

How do you generate GPT Image 2 from CLI?

Generate and edit images using OpenAI's GPT Image 2 model directly through the RunComfy CLI without needing an OpenAI API key.

Who is it for?

Developers using RunComfy agents who need GPT Image 2 text-to-image or edit workflows without direct OpenAI API setup.

Skip if: Teams requiring arbitrary image dimensions beyond GPT Image 2's three fixed sizes or self-hosted diffusion pipelines.

When should I use this skill?

User mentions gpt image 2, gpt-image-2, ChatGPT Images 2, or asks to generate or edit images with this model via RunComfy.

What you get

Generated or edited image files from GPT Image 2 via RunComfy CLI with size and preservation guidance.

Generated image files
Edited image outputs

By the numbers

Documents three fixed GPT Image 2 output sizes
Supports text-to-image and /edit RunComfy CLI subcommands
MIT-licensed skill in agentspace-so/runcomfy-agent-skills

Files

SKILL.mdMarkdownGitHub ↗

GPT Image 2 — Pro Pack on RunComfy

runcomfy.com · Text-to-image · Edit · GitHub

OpenAI GPT Image 2 (ChatGPT Images 2.0) hosted on the RunComfy Model API — no OpenAI key, async REST.

npx skills add agentspace-so/runcomfy-skills --skill gpt-image-2 -g

When to pick this model (vs siblings)

GPT Image 2's distinct strength is directive precision: it follows multi-element prompts, layout cues, and embedded-text instructions more reliably than its peers. Pick it when what's on the canvas matters more than how stylized it looks.

You want	Use
Embedded text, logos, signage, multilingual typography	GPT Image 2
Brand-safe, e-commerce / ad / UI mockup imagery	GPT Image 2
Iterative refinement that holds composition stable	GPT Image 2
Heavy stylization, painterly look	Flux 2
Hyperrealistic portrait	Nano Banana Pro
Cinematic / aesthetic-first hero shots	Seedream 5

If the user explicitly asked for GPT Image 2 / ChatGPT Image 2 / Image 2, route here regardless — don't second-guess the model choice.

Prerequisites

1. RunComfy CLI — npm i -g @runcomfy/cli 2. RunComfy account — runcomfy login opens a browser device-code flow. 3. CI / containers — set RUNCOMFY_TOKEN=<token> instead of runcomfy login.

Endpoints + input schema

Two endpoints, same model.

`openai/gpt-image-2/text-to-image`

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	The positive prompt
`size`	enum	no	`1024_1024`	`1024_1024` (1:1), `1024_1536` (2:3 portrait), `1536_1024` (3:2 landscape) — only these three

`openai/gpt-image-2/edit`

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Natural-language edit instruction
`images`	string[]	yes	—	Up to 10 reference image URLs (publicly fetchable HTTPS)
`size`	enum	no	`auto`	`auto` (preserve input ratio), or one of the three fixed sizes above

size=auto on edit preserves the input aspect ratio — strongly recommended unless the edit explicitly changes framing.

How to invoke

Text-to-image:

runcomfy run openai/gpt-image-2/text-to-image \
  --input '{"prompt": "<user prompt>", "size": "1024_1536"}' \
  --output-dir <absolute/path>

Edit (single ref):

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "<edit instruction>",
    "images": ["https://..."]
  }' \
  --output-dir <absolute/path>

Edit (multi-ref, up to 10):

runcomfy run openai/gpt-image-2/edit \
  --input '{
    "prompt": "compose subject from image 1 into the room from image 2; match the lighting of image 2",
    "images": ["https://...subject.jpg", "https://...room.jpg"]
  }' \
  --output-dir <absolute/path>

The CLI submits, polls every 2s until terminal, then downloads any *.runcomfy.net / *.runcomfy.com URL from the result into --output-dir. Stdout is the result JSON. Stderr is progress.

For pipe-friendly usage:

runcomfy --output json run openai/gpt-image-2/text-to-image \
  --input '{"prompt":"..."}' --no-wait | jq -r .request_id

Prompting — what actually works

These are model-specific patterns that empirically improve output quality. Apply to text-to-image and edit alike.

Be explicit on subject + setting + mood. "A close-up of a matte ceramic water bottle on warm linen, soft window light, neutral background" — three concrete directives — beats "nice product photo of a bottle".

Quote embedded text exactly. Keep it short. GPT Image 2 is the strongest text-rendering model in this class, but only when you put the literal characters in quotes. Long blocks of text degrade. For multilingual text, name the script: "Japanese kana", "Cyrillic", "Arabic right-to-left".

Use compositional cues directly. "rule of thirds", "close-up", "aerial view", "centered subject", "shallow depth of field" — these have learned-meaning to the model.

Iterate one attribute at a time. When refining, change one thing per iteration (lighting OR background OR pose OR text) and keep the rest of the prompt verbatim. The model holds composition stable across iterations when only one knob moves.

Don't conflict instructions. "no text" + "the word 'AQUA+' on the label" is incoherent — the model will pick one and you don't control which.

Don't pile up styles. "ukiyo-e + watercolor + 8K + cinematic + minimalist" cancels out. Pick one or two style anchors max.

For the edit endpoint specifically:

State preservation goals. "keep the person's pose and face identity unchanged", "keep the brand mark and typography on the package", "keep the overall framing". The model needs to know what NOT to change.
Use directional language for spatial edits. "Move the headline from top-right to bottom-center", not "reposition the headline".
Multi-ref: number the images in the prompt — "subject from image 1, lighting and background from image 2" — and the model will route the cues correctly.

Where it shines

Use case	Why GPT Image 2
E-commerce product photography	Reliable text on labels, brand-safe lighting, consistent across SKUs
High-conversion ads	Headline + visual integration in one pass
Brand asset localization	One source asset → many language variants of the same headline
Signage, posters, packaging mock-ups	Text rendering accuracy at multiple scales
UI mockups, scientific illustrations	Layout precision and label legibility

Sample prompts (verified to produce strong results)

Text-to-image — product hero:

A minimal hero product still life: a matte ceramic water bottle on warm linen,
soft window light, the word "AQUA+" in clean sans-serif on the label,
subtle rim highlights, e-commerce ready, 8K detail, neutral background

Text-to-image — multilingual signage:

A small Tokyo café storefront at dusk, warm interior glow,
the sign reads "コーヒー" in bold Japanese kana on a wooden plaque,
shallow depth of field, rule of thirds, cinematic

Edit — background swap with preservation:

Turn the background into a bright minimal white-to-soft-gray studio sweep
with gentle floor shadow; add a large headline in-image that reads
"OPEN STUDIO" in a bold clean sans-serif, high contrast, centered;
keep the main person or product, pose, and face identity unchanged

Limitations

Only 3 fixed sizes on text-to-image (and the same 3 + auto on edit). Extreme aspect ratios are auto-resized to the nearest supported one.
Prompt length ~ a few thousand tokens. Long blocks of embedded text degrade output.
Edit's multi-image support is "guidance from up to 10 refs", not ControlNet-style stacks. The first image is treated as the primary; the rest provide auxiliary cues.
Photorealism on portraits is not its strongest suit — Nano Banana Pro wins that head-to-head.

Exit codes

The runcomfy CLI uses sysexits-style codes:

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch (e.g. `size: "2048_2048"` would 422)
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

1. The skill invokes runcomfy run openai/gpt-image-2/<endpoint> with a JSON body matching the schema above. 2. The CLI POSTs to https://model-api.runcomfy.net/v1/models/openai/gpt-image-2/<endpoint> with the user's bearer token. 3. The Model API returns a request_id; the CLI polls GET .../requests/<id>/status every 2 seconds. 4. On terminal status, the CLI fetches GET .../requests/<id>/result and downloads any URL whose host ends with .runcomfy.net or .runcomfy.com into --output-dir. Other URLs are listed but not fetched. 5. Ctrl-C while polling sends POST .../requests/<id>/cancel so you don't get billed for GPU you stopped.

What this skill is not

Not a direct OpenAI API client. Not a capability grant — depends on a working RunComfy account. Not multi-tenant.

Security & Privacy

Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

Related skills

Remotion Best PracticesGet Remotion-specific coding guidance that prevents common video rendering mistakes when creating animated React videos.442k4.1k

Remotion RenderGenerate high-quality MP4 videos from React code using Remotion inside an AI coding agent.363k648

Ai Video GenerationTurn written prompts into short videos using AI video generation models directly from Cursor or Claude.363k648

Ai Avatar VideoGenerate short talking-head videos of custom AI avatars from text prompts.363k648

Ai Image GenerationLet their coding agent generate, iterate on, and insert high-quality images directly into web apps, marketing assets, or product features.363k648

Video EditIntelligently route video editing requests to the best RunComfy model without trial-and-error.357k31

How it compares

Use gpt-image-2 for logos, embedded text, and multilingual typography; route to Flux 2 or Seedream siblings for other visual styles per skill guidance.

FAQ

Does gpt-image-2 need an OpenAI API key?

gpt-image-2 routes generation through the RunComfy CLI Pro Pack, so developers call runcomfy run openai/gpt-image-2/text-to-image or /edit locally without configuring a direct OpenAI API key.

What CLI commands does gpt-image-2 use?

gpt-image-2 invokes runcomfy run openai/gpt-image-2/text-to-image for new images and runcomfy run openai/gpt-image-2/edit for edits, with guidance on preservation language and three fixed output sizes.

When should gpt-image-2 route elsewhere?

gpt-image-2 documents when to switch to sibling RunComfy models Flux 2, Nano Banana Pro, or Seedream—for example when a task needs capabilities outside GPT Image 2's embedded-text and fixed-size strengths.

Is Gpt Image 2 safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Generative Mediaagentsautomation