Comfyui Video Pipeline

Name: Comfyui Video Pipeline
Author: mckruz

mckruz/comfyui-expert

Orchestrate ComfyUI video generation—picking Wan 2.2, FramePack, or AnimateDiff—for image-to-video, text-to-video, and motion-controlled clips from your agent.

Overview

ComfyUI Video Pipeline is an agent skill for the Build phase that selects and runs ComfyUI video engines (Wan 2.2, FramePack, AnimateDiff) for image-to-video and text-to-video generation.

Install

npx skills add https://github.com/mckruz/comfyui-expert --skill comfyui-video-pipeline

What is this skill?

Decision tree routes requests to Wan 2.2 MoE 14B, Wan 2.2 1.3B, FramePack, AnimateDiff Lightning, AnimateDiff V3, or def
Supports image-to-video, text-to-video, talking heads, and motion-controlled animation
VRAM-aware routing: 24GB+ for MoE quality, 8GB for 1.3B, FramePack for 60s on ~6GB, Lightning for 4–8 step iteration
Wan 2.2 MoE called out for exclusive first+last frame control
OpenClaw metadata expects COMFYUI_URL plus curl or wget on darwin, linux, or win32
Engine tree includes Wan 2.2 MoE 14B (24GB+ VRAM), Wan 2.2 1.3B (8GB VRAM), FramePack (~60 seconds on 6GB), and AnimateD

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 588 installs on skills.sh; 69 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You want agent-generated video from images or prompts but do not know which ComfyUI engine, checkpoints, or VRAM budget fit your clip length and quality bar.

Who is it for?

Indie builders with ComfyUI already running who need repeatable routing between Wan, FramePack, and AnimateDiff for content and product demos.

Skip if: Teams without a ComfyUI instance or GPU budget, static image-only generation, or workflows that do not involve video diffusion at all.

When should I use this skill?

Creating any video content from character images or text descriptions with ComfyUI, including image-to-video, text-to-video, talking heads, and motion-controlled animation.

What do I get? / Deliverables

You get a concrete engine choice, model prerequisites, and pipeline steps aligned to your hardware instead of a single default workflow that fails on memory or duration.

Selected engine and pipeline variant for the request
Checkpoint and model path checklist
Rendered video output via ComfyUI workflow execution

Recommended Skills

Video Editagentspace-so/runcomfy-agent-skills

Video Edit is a RunComfy-focused agent skill that acts as a smart router between your edit intent and the correct model …211k installs·15 stars

Image To Videoagentspace-so/runcomfy-agent-skills

Image-to-Video on RunComfy picks the right i2v model for each intent—HappyHorse for general animation, Wan 2.7 with audi…210k installs·15 stars

Image Editagentspace-so/runcomfy-agent-skills

Image Edit is a RunComfy Pro Pack agent skill that acts as a smart router between your edit intent and the right model i…210k installs·15 stars

Flux Kontextagentspace-so/runcomfy-agent-skills

Flux Kontext Pro on RunComfy packages Black Forest Labs' precise local edit model with documented prompting patterns and…210k installs·15 stars

Nano Banana 2agentspace-so/runcomfy-agent-skills

Nano Banana 2 on RunComfy wraps Google's Gemini-family flash text-to-image model with prompting patterns for fast iterat…210k installs·15 stars

Nano Banana Editagentspace-so/runcomfy-agent-skills

Nano Banana Edit on RunComfy documents Google's image-to-image edit endpoint for identity-preserving changes, background…210k installs·15 stars

Journey fit

Primary fit

BuildIntegrations & version control

Video pipeline work happens in Build when you wire generative engines, model paths, and ComfyUI workflows into something shippable. Integrations is the right shelf because the skill selects engines and prerequisites against a remote or local COMFYUI_URL rather than teaching generic frontend UI.

Also useful

LaunchDistribution & launch channels

How it compares

Use instead of one-off ComfyUI JSON dumps when you need VRAM-aware engine selection across Wan 2.2, FramePack, and AnimateDiff—not a hosted video API integration skill.

Common Questions / FAQ

Who is comfyui-video-pipeline for?

Solo builders and small teams using ComfyUI who want an agent to pick the right video stack for quality, length, speed, or motion-control requirements.

When should I use comfyui-video-pipeline?

Use it in Build when creating product videos, character animation from stills, or long clips; use it in Launch when producing distribution-ready motion content from the same ComfyUI setup.

Is comfyui-video-pipeline safe to install?

It needs network access to your ComfyUI URL and download tooling; review the Security Audits panel on this page and lock down COMFYUI_URL to trusted hosts.

SKILL.md

READMESKILL.md - Comfyui Video Pipeline

# ComfyUI Video Pipeline

Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.

## Engine Selection

```
VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)
```

## Pipeline 1: Wan 2.2 MoE (Highest Quality)

### Image-to-Video

**Prerequisites:**
- `wan2.1_i2v_720p_14b_bf16.safetensors` in `models/diffusion_models/`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors` in `models/clip/`
- `open_clip_vit_h_14.safetensors` in `models/clip_vision/`
- `wan_2.1_vae.safetensors` in `models/vae/`

**Settings:**
| Parameter | Value | Notes |
|-----------|-------|-------|
| Resolution | 1280x720 (landscape) or 720x1280 (portrait) | Native training resolution |
| Frames | 81 (~5 seconds at 16fps) | Multiples of 4 + 1 |
| Steps | 30-50 | Higher = better quality |
| CFG | 5-7 | |
| Sampler | uni_pc | Recommended for Wan |
| Scheduler | normal | |

**Frame count guide:**
| Duration | Frames (16fps) |
|----------|----------------|
| 1 second | 17 |
| 3 seconds | 49 |
| 5 seconds | 81 |
| 10 seconds | 161 |

**VRAM optimization:**
- FP8 quantization: halves VRAM with minimal quality loss
- SageAttention: faster attention computation
- Reduce frames if OOM

### Text-to-Video

Same as I2V but uses `wan2.1_t2v_14b_bf16.safetensors` and `EmptySD3LatentImage` instead of image conditioning.

### First+Last Frame Control (Wan 2.2 Exclusive)

Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:
1. Generate two hero images with consistent character
2. Use first as start frame, second as end frame
3. Wan interpolates the motion between them

## Pipeline 2: FramePack (Long Videos, Low VRAM)

### Key Innovation

VRAM usage is **invariant to video length** - generates 60-second videos at 30fps on just 6GB VRAM.

**How it works:**
- Dynamic context compression: 1536 markers for key frames, 192 for transitions
- Bidirectional memory with reverse generation prevents drift
- Frame-by-frame generation with context window

### Settings

| Parameter | Value | Notes |
|-----------|-------|-------|
| Resolution | 640x384 to 1280x720 | Depends on VRAM |
| Duration | Up to 60 seconds | VRAM-invariant |
| Quality | High (comparable to Wan) | Uses same base models |

### When to Use

- Videos longer than 10 seconds
- Limited VRAM systems (but RTX 5090 doesn't need this)
- When VRAM is needed for parallel operations
- Batch video generation

## Pipeline 3: AnimateDiff V3 (Fast, Controllable)

### Strengths

- Motion LoRAs for camera control (pan, zoom, tilt, roll)
- Effect LoRAs (shatter, smoke, explosion, liquid)
- Sliding context window for infinite length
- Very fast with Lightning model (4-8 steps)

### Settings

| Parameter | Value (Standard) | Value (Lightning) |
|-----------|-----------------|-------------------|
| Motion Module | `v3_sd15_mm.ckpt` | `animatediff_lightning_4step.safetensors` |
| Steps | 20-25 | 4-8 |
| CFG | 7-8 | 1.5-2.0 |
| Sampler | euler_ancestral | lcm |
| Resolution | 512x512 | 512x512 |
| Context Length

What is this skill?

Decision tree routes requests to Wan 2.2 MoE 14B, Wan 2.2 1.3B, FramePack, AnimateDiff Lightning, AnimateDiff V3, or def

Supports image-to-video, text-to-video, talking heads, and motion-controlled animation

VRAM-aware routing: 24GB+ for MoE quality, 8GB for 1.3B, FramePack for 60s on ~6GB, Lightning for 4–8 step iteration

Wan 2.2 MoE called out for exclusive first+last frame control

OpenClaw metadata expects COMFYUI_URL plus curl or wget on darwin, linux, or win32

Engine tree includes Wan 2.2 MoE 14B (24GB+ VRAM), Wan 2.2 1.3B (8GB VRAM), FramePack (~60 seconds on 6GB), and AnimateD

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 588 installs on skills.sh; 69 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get a concrete engine choice, model prerequisites, and pipeline steps aligned to your hardware instead of a single default workflow that fails on memory or duration.

Selected engine and pipeline variant for the request

Checkpoint and model path checklist

Rendered video output via ComfyUI workflow execution

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

LaunchDistribution & launch channels

SKILL.md

READMESKILL.md - Comfyui Video Pipeline

# ComfyUI Video Pipeline

Orchestrates video generation across three engines, selecting the best one based on requirements and available resources.

## Engine Selection

```
VIDEO REQUEST
    |
    |-- Need film-level quality?
    |   |-- Yes + 24GB+ VRAM → Wan 2.2 MoE 14B
    |   |-- Yes + 8GB VRAM → Wan 2.2 1.3B
    |
    |-- Need long video (>10 seconds)?
    |   |-- Yes → FramePack (60 seconds on 6GB)
    |
    |-- Need fast iteration?
    |   |-- Yes → AnimateDiff Lightning (4-8 steps)
    |
    |-- Need camera/motion control?
    |   |-- Yes → AnimateDiff V3 + Motion LoRAs
    |
    |-- Need first+last frame control?
    |   |-- Yes → Wan 2.2 MoE (exclusive feature)
    |
    |-- Default → Wan 2.2 (best general quality)
```

## Pipeline 1: Wan 2.2 MoE (Highest Quality)

### Image-to-Video

**Prerequisites:**
- `wan2.1_i2v_720p_14b_bf16.safetensors` in `models/diffusion_models/`
- `umt5_xxl_fp8_e4m3fn_scaled.safetensors` in `models/clip/`
- `open_clip_vit_h_14.safetensors` in `models/clip_vision/`
- `wan_2.1_vae.safetensors` in `models/vae/`

**Settings:**
| Parameter | Value | Notes |
|-----------|-------|-------|
| Resolution | 1280x720 (landscape) or 720x1280 (portrait) | Native training resolution |
| Frames | 81 (~5 seconds at 16fps) | Multiples of 4 + 1 |
| Steps | 30-50 | Higher = better quality |
| CFG | 5-7 | |
| Sampler | uni_pc | Recommended for Wan |
| Scheduler | normal | |

**Frame count guide:**
| Duration | Frames (16fps) |
|----------|----------------|
| 1 second | 17 |
| 3 seconds | 49 |
| 5 seconds | 81 |
| 10 seconds | 161 |

**VRAM optimization:**
- FP8 quantization: halves VRAM with minimal quality loss
- SageAttention: faster attention computation
- Reduce frames if OOM

### Text-to-Video

Same as I2V but uses `wan2.1_t2v_14b_bf16.safetensors` and `EmptySD3LatentImage` instead of image conditioning.

### First+Last Frame Control (Wan 2.2 Exclusive)

Wan 2.2 MoE allows specifying both the first and last frame, enabling precise video planning:
1. Generate two hero images with consistent character
2. Use first as start frame, second as end frame
3. Wan interpolates the motion between them

## Pipeline 2: FramePack (Long Videos, Low VRAM)

### Key Innovation

VRAM usage is **invariant to video length** - generates 60-second videos at 30fps on just 6GB VRAM.

**How it works:**
- Dynamic context compression: 1536 markers for key frames, 192 for transitions
- Bidirectional memory with reverse generation prevents drift
- Frame-by-frame generation with context window

### Settings

| Parameter | Value | Notes |
|-----------|-------|-------|
| Resolution | 640x384 to 1280x720 | Depends on VRAM |
| Duration | Up to 60 seconds | VRAM-invariant |
| Quality | High (comparable to Wan) | Uses same base models |

### When to Use

- Videos longer than 10 seconds
- Limited VRAM systems (but RTX 5090 doesn't need this)
- When VRAM is needed for parallel operations
- Batch video generation

## Pipeline 3: AnimateDiff V3 (Fast, Controllable)

### Strengths

- Motion LoRAs for camera control (pan, zoom, tilt, roll)
- Effect LoRAs (shatter, smoke, explosion, liquid)
- Sliding context window for infinite length
- Very fast with Lightning model (4-8 steps)

### Settings

| Parameter | Value (Standard) | Value (Lightning) |
|-----------|-----------------|-------------------|
| Motion Module | `v3_sd15_mm.ckpt` | `animatediff_lightning_4step.safetensors` |
| Steps | 20-25 | 4-8 |
| CFG | 7-8 | 1.5-2.0 |
| Sampler | euler_ancestral | lcm |
| Resolution | 512x512 | 512x512 |
| Context Length

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is comfyui-video-pipeline for?

When should I use comfyui-video-pipeline?

Is comfyui-video-pipeline safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is comfyui-video-pipeline for?

When should I use comfyui-video-pipeline?

Is comfyui-video-pipeline safe to install?

SKILL.md