
Ffmpeg Analyse Video
Turn screen recordings and tutorial videos into timestamped text summaries without flooding your main agent context with image tokens.
Overview
FFmpeg video analysis is an agent skill most often used in Build (also Idea, Grow) that extracts video frames with ffmpeg and uses batched sub-agent vision to produce timestamped text summaries.
Install
npx skills add https://github.com/fabriqaai/ffmpeg-analyse-video-skill --skill ffmpeg-analyse-videoWhat is this skill?
- 3-phase pipeline: ffprobe metadata, ffmpeg frame extraction, batched sub-agent vision analysis
- Sub-agents write batch_N_analysis.md text reports so the main agent only reads prose, not images
- Produces structured timestamped step-by-step summaries from tutorials, presentations, and footage
- Triggers on analyse this video, summarise this recording, and similar visual-understanding requests
- Delegates frame reading to disposable sub-agent contexts to avoid exhausting the primary context window
- Documented as a 3-phase context-efficient sub-agent pipeline
- Sub-agents write per-batch text reports (batch_N_analysis.md) for main-agent synthesis only
Adoption & trust: 656 installs on skills.sh; 12 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a long screen recording or tutorial video but cannot quickly turn its on-screen steps into searchable, timestamped notes in your agent chat.
Who is it for?
Builders documenting UI flows, reverse-engineering tutorial videos, or summarising demos with ffmpeg already available locally.
Skip if: Real-time video moderation at scale or workflows that require frame-perfect editing decisions without human review of extracted summaries.
When should I use this skill?
User provides a video file and wants visual understanding—analyse this video, what happens in this video, summarise this recording, tutorials, presentations, or animations.
What do I get? / Deliverables
You get a structured timestamped summary built from text-only sub-agent reports so the main session stays lean and you can reuse the breakdown in docs or tasks.
- Timestamped step-by-step video summary
- Batch text analysis files from sub-agents
- ffprobe-derived metadata used in the synthesis
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Video understanding is most often invoked while building—documenting flows, learning from recordings, or tooling agent pipelines—even though the same summaries support research and content reuse. Agent-tooling fits the sub-agent batching architecture designed to preserve the main context window during vision-heavy work.
Where it fits
Summarise a competitor demo video into timestamped feature notes before you scope your own product.
Run the 3-phase sub-agent pipeline on a codebase walkthrough recording for implementation checklists.
Convert an internal how-to screencast into step-by-step documentation with time markers.
Draft blog or changelog outlines from webinar recordings without manual transcription.
How it compares
Agent workflow for vision summarisation via ffmpeg frames—not a hosted video CDN or automated video editor.
Common Questions / FAQ
Who is ffmpeg-analyse-video for?
Solo and indie builders using agentic coding tools who need to understand or document visual content in local video files via ffmpeg plus AI vision.
When should I use ffmpeg-analyse-video?
Use it in Build when turning recordings into agent-tooling notes or docs; in Idea when researching competitors or tutorials from video; and in Grow when repurposing webinar or demo footage into written content.
Is ffmpeg-analyse-video safe to install?
Review the Security Audits panel on this Prism page; the skill runs shell ffmpeg commands and reads local video files, so only analyse media you are allowed to process.
SKILL.md
READMESKILL.md - Ffmpeg Analyse Video
# FFmpeg Video Analysis Extract frames from video files with ffmpeg. Delegate frame reading to sub-agents to preserve the main context window. Synthesise a structured timestamped summary from text-only sub-agent reports. ## Architecture: Context-Efficient Sub-Agent Pipeline **Problem**: Reading dozens of images into the main conversation context consumes most of the context window, leaving little room for synthesis and follow-up. **Solution**: A 3-phase pipeline: ``` Main Agent Sub-Agents (disposable context) ────────── ────────────────────────────── 1. ffprobe metadata ───► 2. ffmpeg frame extraction ───► 3. Split frames into batches ──► 4. Read images (vision) Write text descriptions to batch_N_analysis.md 5. Read text files only ◄─── (context discarded) 6. Synthesise final output ``` Images only ever exist inside sub-agent contexts. The main agent only reads lightweight text files. This cuts context usage by ~90%. ## 1. Prerequisites ```bash which ffmpeg && which ffprobe ``` If either is missing, show platform-specific install instructions and STOP: - **macOS**: `brew install ffmpeg` - **Ubuntu/Debian**: `sudo apt install ffmpeg` - **Windows**: `choco install ffmpeg` or `winget install ffmpeg` ## 2. Setup Temp Directory ```bash # macOS/Linux TMPDIR="/tmp/video-analysis-$(date +%s)" mkdir -p "$TMPDIR" # Windows (PowerShell) # $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)" # New-Item -ItemType Directory -Path $TMPDIR ``` ## 3. Extract Video Metadata ```bash ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH" ``` Extract and report: duration, resolution (width x height), fps, codec, file size, whether audio is present. If no video stream is found, report "audio-only file" and STOP. If file size > 2GB, warn the user and suggest analysing a time range with `-ss START -to END`. ## 4. Extract Frames Choose strategy based on duration: | Duration | Strategy | Command | |----------|----------|---------| | 0-60s | 1 frame every 2s | `ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg` | | 1-10min | Scene detection (threshold 0.3) | `ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg` | | 10-30min | Keyframe extraction | `ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg` | | 30min+ | Thumbnail filter | `ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg` | For thumbnail filter, calculate `SEGMENT_FRAMES = total_frames / 60` to cap output at ~60 frames. **Fallbacks:** - Scene detection yields 0 frames → retry with interval at 1 frame/5s - More than 100 frames extracted → subsample evenly to 80 - Frame extraction fails → try the next simpler strategy (scene → interval, keyframe → interval) **Time range analysis:** When user specifies a range, prepend `-ss START -to END` before `-i`. **Higher detail mode:** If requested, double the fps rate and lower scene threshold to 0.2. After extraction, list all frame files and calculate each frame's timestamp from its sequence number and the extraction rate. ## 5. Delegate Frame Analysis to Sub-Agents **This is the critical context-saving step.** Do NOT read frame images in th