Video Generation

Name: Video Generation
Author: bytedance

bytedance/deer-flow

Generate short videos from a text prompt file and optional reference images using Google Veo 3.1 long-running predict API and GEMINI_API_KEY.

Overview

video-generation is an agent skill for the Build phase that generates videos via the Gemini Veo 3.1 API from prompts and optional reference images using a Python requests workflow.

Install

npx skills add https://github.com/bytedance/deer-flow --skill video-generation

What is this skill?

Python helper posts to veo-3.1-generate-preview predictLongRunning endpoint
Accepts prompt file plus optional reference images as base64 JPEG assets
Polls operation name until the long-running video job completes
Configurable aspect ratio (default 16:9) and file-based output path
Requires GEMINI_API_KEY environment variable
Uses model veo-3.1-generate-preview via predictLongRunning
Default video aspect ratio 16:9

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.9k installs on skills.sh; 70.7k GitHub stars; 0/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need repeatable programmatic video generation in an agent repo but only have ad-hoc API snippets without polling, reference image encoding, or env-key handling.

Who is it for?

Indie builders automating short-form or demo videos inside Deer Flow–style Python agents with a Gemini API key.

Skip if: Non-developers needing a GUI video editor, teams without Google Gemini/Veo API access, or production video pipelines requiring legal review workflows.

When should I use this skill?

When implementing programmatic video generation with Gemini Veo in a Python agent workflow.

What do I get? / Deliverables

A long-running Veo job is submitted and polled to completion and video output is saved to the specified file using your prompt and reference assets.

Generated video file at output_file path
Completed long-running API operation handling

Recommended Skills

Video Editagentspace-so/runcomfy-agent-skills

Video Edit is a RunComfy-focused agent skill that acts as a smart router between your edit intent and the correct model …211k installs·15 stars

Image To Videoagentspace-so/runcomfy-agent-skills

Image-to-Video on RunComfy picks the right i2v model for each intent—HappyHorse for general animation, Wan 2.7 with audi…210k installs·15 stars

Image Editagentspace-so/runcomfy-agent-skills

Image Edit is a RunComfy Pro Pack agent skill that acts as a smart router between your edit intent and the right model i…210k installs·15 stars

Flux Kontextagentspace-so/runcomfy-agent-skills

Flux Kontext Pro on RunComfy packages Black Forest Labs' precise local edit model with documented prompting patterns and…210k installs·15 stars

Nano Banana 2agentspace-so/runcomfy-agent-skills

Nano Banana 2 on RunComfy wraps Google's Gemini-family flash text-to-image model with prompting patterns for fast iterat…210k installs·15 stars

Nano Banana Editagentspace-so/runcomfy-agent-skills

Nano Banana Edit on RunComfy documents Google's image-to-image edit endpoint for identity-preserving changes, background…210k installs·15 stars

Journey fit

Primary fit

BuildIntegrations & version control

The skill is an API integration script wired into an agent workflow for producing media assets during product or content build-out. Implementation is HTTP calls to generativelanguage.googleapis.com with polling—not ASO, not infra ops—so it sits under build integrations.

Also useful

GrowContent & marketing

Also useful

LaunchDistribution & launch channels

How it compares

Use as a code-first Veo API generator inside agent repos, not as a journey-wide creative brief or storyboarding methodology skill.

Common Questions / FAQ

Who is video-generation for?

Developers running ByteDance Deer Flow or similar Python agent stacks who want Veo-backed video files from prompts without building the HTTP polling boilerplate from scratch.

When should I use video-generation?

Use it during Build (integrations) when wiring generative video into a tool, and optionally in Grow (content) when batch-producing channel assets—always with GEMINI_API_KEY set and reference images ready on disk.

Is video-generation safe to install?

The skill runs network calls and reads local images; review Security Audits on this page and keep GEMINI_API_KEY out of committed code.

SKILL.md

READMESKILL.md - Video Generation

import base64
import os
import time

import requests


def generate_video(
    prompt_file: str,
    reference_images: list[str],
    output_file: str,
    aspect_ratio: str = "16:9",
) -> str:
    with open(prompt_file, "r", encoding="utf-8") as f:
        prompt = f.read()
    referenceImages = []
    i = 0
    json = {
        "instances": [{"prompt": prompt}],
    }
    for reference_image in reference_images:
        i += 1
        with open(reference_image, "rb") as f:
            image_b64 = base64.b64encode(f.read()).decode("utf-8")
        referenceImages.append(
            {
                "image": {"mimeType": "image/jpeg", "bytesBase64Encoded": image_b64},
                "referenceType": "asset",
            }
        )
    if i > 0:
        json["instances"][0]["referenceImages"] = referenceImages
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        return "GEMINI_API_KEY is not set"
    response = requests.post(
        "https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning",
        headers={
            "x-goog-api-key": api_key,
            "Content-Type": "application/json",
        },
        json=json,
    )
    json = response.json()
    operation_name = json["name"]
    while True:
        response = requests.get(
            f"https://generativelanguage.googleapis.com/v1beta/{operation_name}",
            headers={
                "x-goog-api-key": api_key,
            },
        )
        json = response.json()
        if json.get("done", False):
            sample = json["response"]["generateVideoResponse"]["generatedSamples"][0]
            url = sample["video"]["uri"]
            download(url, output_file)
            break
        time.sleep(3)
    return f"The video has been generated successfully to {output_file}"


def download(url: str, output_file: str):
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        return "GEMINI_API_KEY is not set"
    response = requests.get(
        url,
        headers={
            "x-goog-api-key": api_key,
        },
    )
    with open(output_file, "wb") as f:
        f.write(response.content)


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Generate videos using Gemini API")
    parser.add_argument(
        "--prompt-file",
        required=True,
        help="Absolute path to JSON prompt file",
    )
    parser.add_argument(
        "--reference-images",
        nargs="*",
        default=[],
        help="Absolute paths to reference images (space-separated)",
    )
    parser.add_argument(
        "--output-file",
        required=True,
        help="Output path for generated image",
    )
    parser.add_argument(
        "--aspect-ratio",
        required=False,
        default="16:9",
        help="Aspect ratio of the generated image",
    )

    args = parser.parse_args()

    try:
        print(
            generate_video(
                args.prompt_file,
                args.reference_images,
                args.output_file,
                args.aspect_ratio,
            )
        )
    except Exception as e:
        print(f"Error while generating video: {e}")


---
name: video-generation
description: Use this skill when the user requests to generate, create, or imagine videos. Supports structured prompts and reference image for guided generation.
---

# Video Generation Skill

## Overview

This skill generates high-quality videos using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing video generation with optional reference image.

## Core Capabilities

- Create structured JSON prompts for AIGC video generation
- Support reference image as guidance or the first/last frame of the video
- Generate videos through automated Python script execution

## Workflow

### Step 1: Understand Requirements

When a user requests video generation, identify:

- Subject/content:

What is this skill?

Python helper posts to veo-3.1-generate-preview predictLongRunning endpoint

Accepts prompt file plus optional reference images as base64 JPEG assets

Polls operation name until the long-running video job completes

Configurable aspect ratio (default 16:9) and file-based output path

Requires GEMINI_API_KEY environment variable

Uses model veo-3.1-generate-preview via predictLongRunning

Default video aspect ratio 16:9

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.9k installs on skills.sh; 70.7k GitHub stars; 0/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

GrowContent & marketing

Also useful

LaunchDistribution & launch channels

SKILL.md

READMESKILL.md - Video Generation

import base64
import os
import time

import requests


def generate_video(
    prompt_file: str,
    reference_images: list[str],
    output_file: str,
    aspect_ratio: str = "16:9",
) -> str:
    with open(prompt_file, "r", encoding="utf-8") as f:
        prompt = f.read()
    referenceImages = []
    i = 0
    json = {
        "instances": [{"prompt": prompt}],
    }
    for reference_image in reference_images:
        i += 1
        with open(reference_image, "rb") as f:
            image_b64 = base64.b64encode(f.read()).decode("utf-8")
        referenceImages.append(
            {
                "image": {"mimeType": "image/jpeg", "bytesBase64Encoded": image_b64},
                "referenceType": "asset",
            }
        )
    if i > 0:
        json["instances"][0]["referenceImages"] = referenceImages
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        return "GEMINI_API_KEY is not set"
    response = requests.post(
        "https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning",
        headers={
            "x-goog-api-key": api_key,
            "Content-Type": "application/json",
        },
        json=json,
    )
    json = response.json()
    operation_name = json["name"]
    while True:
        response = requests.get(
            f"https://generativelanguage.googleapis.com/v1beta/{operation_name}",
            headers={
                "x-goog-api-key": api_key,
            },
        )
        json = response.json()
        if json.get("done", False):
            sample = json["response"]["generateVideoResponse"]["generatedSamples"][0]
            url = sample["video"]["uri"]
            download(url, output_file)
            break
        time.sleep(3)
    return f"The video has been generated successfully to {output_file}"


def download(url: str, output_file: str):
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        return "GEMINI_API_KEY is not set"
    response = requests.get(
        url,
        headers={
            "x-goog-api-key": api_key,
        },
    )
    with open(output_file, "wb") as f:
        f.write(response.content)


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Generate videos using Gemini API")
    parser.add_argument(
        "--prompt-file",
        required=True,
        help="Absolute path to JSON prompt file",
    )
    parser.add_argument(
        "--reference-images",
        nargs="*",
        default=[],
        help="Absolute paths to reference images (space-separated)",
    )
    parser.add_argument(
        "--output-file",
        required=True,
        help="Output path for generated image",
    )
    parser.add_argument(
        "--aspect-ratio",
        required=False,
        default="16:9",
        help="Aspect ratio of the generated image",
    )

    args = parser.parse_args()

    try:
        print(
            generate_video(
                args.prompt_file,
                args.reference_images,
                args.output_file,
                args.aspect_ratio,
            )
        )
    except Exception as e:
        print(f"Error while generating video: {e}")


---
name: video-generation
description: Use this skill when the user requests to generate, create, or imagine videos. Supports structured prompts and reference image for guided generation.
---

# Video Generation Skill

## Overview

This skill generates high-quality videos using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing video generation with optional reference image.

## Core Capabilities

- Create structured JSON prompts for AIGC video generation
- Support reference image as guidance or the first/last frame of the video
- Generate videos through automated Python script execution

## Workflow

### Step 1: Understand Requirements

When a user requests video generation, identify:

- Subject/content:

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is video-generation for?

When should I use video-generation?

Is video-generation safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is video-generation for?

When should I use video-generation?

Is video-generation safe to install?

SKILL.md