
Image Generation
Generate marketing and product images from a prompt file plus validated reference images with a fixed aspect ratio.
Overview
Image-generation is an agent skill for the Build phase that runs a Python pipeline to create images from a prompt file and validated reference images.
Install
npx skills add https://github.com/bytedance/deer-flow --skill image-generationWhat is this skill?
- Reads generation prompts from a UTF-8 prompt file path
- Validates reference images with PIL open, verify, and full load before use
- Skips corrupted references with explicit console warnings instead of failing silently
- Supports configurable aspect ratio defaulting to 16:9
- Writes output to a caller-specified image file path
- Default aspect ratio parameter is 16:9
- Reference images validated with PIL verify and load
Adoption & trust: 1.8k installs on skills.sh; 70.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have text prompts and reference art but no reliable scriptable step to validate inputs and write finished image files for your ship list.
Who is it for?
Indie builders automating blog heroes, app-store screenshots, or campaign stills inside an agent-driven repo.
Skip if: Teams needing a full design system, vector brand kit, or manual Figma iteration without Python tooling.
When should I use this skill?
You need scriptable image generation from a prompt file with optional reference images and filesystem output.
What do I get? / Deliverables
You get a generated image file on disk with invalid references filtered out and aspect ratio applied to the request.
- Generated image file at output_file path
- Console logs for skipped invalid references
Recommended Skills
Journey fit
Primary value is wiring a Python image pipeline into your agent workflow during product and content build. The skill is an integration layer (requests, PIL, API-style generation) rather than UI polish alone.
How it compares
Use as a code-first generator skill, not a hosted Canva-style MCP with no local validation layer.
Common Questions / FAQ
Who is image-generation for?
Solo developers and content builders who want Pillow-validated reference images and file-based prompts in an agent workflow.
When should I use image-generation?
During Build when integrating generative visuals for landing pages, docs, or growth creatives—after you have prompt and reference paths ready.
Is image-generation safe to install?
Check the Security Audits panel on this page; the skill uses network requests and local filesystem paths you must scope to trusted dirs.
SKILL.md
READMESKILL.md - Image Generation
import base64 import os import requests from PIL import Image def validate_image(image_path: str) -> bool: """ Validate if an image file can be opened and is not corrupted. Args: image_path: Path to the image file Returns: True if the image is valid and can be opened, False otherwise """ try: with Image.open(image_path) as img: img.verify() # Verify that it's a valid image # Re-open to check if it can be fully loaded (verify() may not catch all issues) with Image.open(image_path) as img: img.load() # Force load the image data return True except Exception as e: print(f"Warning: Image '{image_path}' is invalid or corrupted: {e}") return False def generate_image( prompt_file: str, reference_images: list[str], output_file: str, aspect_ratio: str = "16:9", ) -> str: with open(prompt_file, "r", encoding="utf-8") as f: prompt = f.read() parts = [] i = 0 # Filter out invalid reference images valid_reference_images = [] for ref_img in reference_images: if validate_image(ref_img): valid_reference_images.append(ref_img) else: print(f"Skipping invalid reference image: {ref_img}") if len(valid_reference_images) < len(reference_images): print(f"Note: {len(reference_images) - len(valid_reference_images)} reference image(s) were skipped due to validation failure.") for reference_image in valid_reference_images: i += 1 with open(reference_image, "rb") as f: image_b64 = base64.b64encode(f.read()).decode("utf-8") parts.append( { "inlineData": { "mimeType": "image/jpeg", "data": image_b64, } } ) api_key = os.getenv("GEMINI_API_KEY") if not api_key: return "GEMINI_API_KEY is not set" response = requests.post( "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent", headers={ "x-goog-api-key": api_key, "Content-Type": "application/json", }, json={ "generationConfig": {"imageConfig": {"aspectRatio": aspect_ratio}}, "contents": [{"parts": [*parts, {"text": prompt}]}], }, ) response.raise_for_status() json = response.json() parts: list[dict] = json["candidates"][0]["content"]["parts"] image_parts = [part for part in parts if part.get("inlineData", False)] if len(image_parts) == 1: base64_image = image_parts[0]["inlineData"]["data"] # Save the image to a file with open(output_file, "wb") as f: f.write(base64.b64decode(base64_image)) return f"Successfully generated image to {output_file}" else: raise Exception("Failed to generate image") if __name__ == "__main__": import argparse parser = argparse.ArgumentParser(description="Generate images using Gemini API") parser.add_argument( "--prompt-file", required=True, help="Absolute path to JSON prompt file", ) parser.add_argument( "--reference-images", nargs="*", default=[], help="Absolute paths to reference images (space-separated)", ) parser.add_argument( "--output-file", required=True, help="Output path for generated image", ) parser.add_argument( "--aspect-ratio", required=False, default="16:9", help="Aspect ratio of the generated image", ) args = parser.parse_args() try: print( generate_image( args.prompt_file, args.reference_images, args.output_file, args.aspect_ratio, ) ) except Exception as e: print(f"Error