
Video Generation
Generate short videos from a text prompt file and optional reference images using Google Veo 3.1 long-running predict API and GEMINI_API_KEY.
Overview
video-generation is an agent skill for the Build phase that generates videos via the Gemini Veo 3.1 API from prompts and optional reference images using a Python requests workflow.
Install
npx skills add https://github.com/bytedance/deer-flow --skill video-generationWhat is this skill?
- Python helper posts to veo-3.1-generate-preview predictLongRunning endpoint
- Accepts prompt file plus optional reference images as base64 JPEG assets
- Polls operation name until the long-running video job completes
- Configurable aspect ratio (default 16:9) and file-based output path
- Requires GEMINI_API_KEY environment variable
- Uses model veo-3.1-generate-preview via predictLongRunning
- Default video aspect ratio 16:9
Adoption & trust: 1.9k installs on skills.sh; 70.7k GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need repeatable programmatic video generation in an agent repo but only have ad-hoc API snippets without polling, reference image encoding, or env-key handling.
Who is it for?
Indie builders automating short-form or demo videos inside Deer Flow–style Python agents with a Gemini API key.
Skip if: Non-developers needing a GUI video editor, teams without Google Gemini/Veo API access, or production video pipelines requiring legal review workflows.
When should I use this skill?
When implementing programmatic video generation with Gemini Veo in a Python agent workflow.
What do I get? / Deliverables
A long-running Veo job is submitted and polled to completion and video output is saved to the specified file using your prompt and reference assets.
- Generated video file at output_file path
- Completed long-running API operation handling
Recommended Skills
Journey fit
The skill is an API integration script wired into an agent workflow for producing media assets during product or content build-out. Implementation is HTTP calls to generativelanguage.googleapis.com with polling—not ASO, not infra ops—so it sits under build integrations.
How it compares
Use as a code-first Veo API generator inside agent repos, not as a journey-wide creative brief or storyboarding methodology skill.
Common Questions / FAQ
Who is video-generation for?
Developers running ByteDance Deer Flow or similar Python agent stacks who want Veo-backed video files from prompts without building the HTTP polling boilerplate from scratch.
When should I use video-generation?
Use it during Build (integrations) when wiring generative video into a tool, and optionally in Grow (content) when batch-producing channel assets—always with GEMINI_API_KEY set and reference images ready on disk.
Is video-generation safe to install?
The skill runs network calls and reads local images; review Security Audits on this page and keep GEMINI_API_KEY out of committed code.
SKILL.md
READMESKILL.md - Video Generation
import base64 import os import time import requests def generate_video( prompt_file: str, reference_images: list[str], output_file: str, aspect_ratio: str = "16:9", ) -> str: with open(prompt_file, "r", encoding="utf-8") as f: prompt = f.read() referenceImages = [] i = 0 json = { "instances": [{"prompt": prompt}], } for reference_image in reference_images: i += 1 with open(reference_image, "rb") as f: image_b64 = base64.b64encode(f.read()).decode("utf-8") referenceImages.append( { "image": {"mimeType": "image/jpeg", "bytesBase64Encoded": image_b64}, "referenceType": "asset", } ) if i > 0: json["instances"][0]["referenceImages"] = referenceImages api_key = os.getenv("GEMINI_API_KEY") if not api_key: return "GEMINI_API_KEY is not set" response = requests.post( "https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning", headers={ "x-goog-api-key": api_key, "Content-Type": "application/json", }, json=json, ) json = response.json() operation_name = json["name"] while True: response = requests.get( f"https://generativelanguage.googleapis.com/v1beta/{operation_name}", headers={ "x-goog-api-key": api_key, }, ) json = response.json() if json.get("done", False): sample = json["response"]["generateVideoResponse"]["generatedSamples"][0] url = sample["video"]["uri"] download(url, output_file) break time.sleep(3) return f"The video has been generated successfully to {output_file}" def download(url: str, output_file: str): api_key = os.getenv("GEMINI_API_KEY") if not api_key: return "GEMINI_API_KEY is not set" response = requests.get( url, headers={ "x-goog-api-key": api_key, }, ) with open(output_file, "wb") as f: f.write(response.content) if __name__ == "__main__": import argparse parser = argparse.ArgumentParser(description="Generate videos using Gemini API") parser.add_argument( "--prompt-file", required=True, help="Absolute path to JSON prompt file", ) parser.add_argument( "--reference-images", nargs="*", default=[], help="Absolute paths to reference images (space-separated)", ) parser.add_argument( "--output-file", required=True, help="Output path for generated image", ) parser.add_argument( "--aspect-ratio", required=False, default="16:9", help="Aspect ratio of the generated image", ) args = parser.parse_args() try: print( generate_video( args.prompt_file, args.reference_images, args.output_file, args.aspect_ratio, ) ) except Exception as e: print(f"Error while generating video: {e}") --- name: video-generation description: Use this skill when the user requests to generate, create, or imagine videos. Supports structured prompts and reference image for guided generation. --- # Video Generation Skill ## Overview This skill generates high-quality videos using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing video generation with optional reference image. ## Core Capabilities - Create structured JSON prompts for AIGC video generation - Support reference image as guidance or the first/last frame of the video - Generate videos through automated Python script execution ## Workflow ### Step 1: Understand Requirements When a user requests video generation, identify: - Subject/content: