
Video Use
Edit talking-head, tutorial, or montage footage through conversation—transcribe, cut, grade, subtitles—without learning a non-linear editor preset maze.
Overview
video-use is an agent skill most often used in Grow (also Launch distribution and Validate prototype demos) that edits video through confirmed conversational plans anchored on a packed transcript.
Install
npx skills add https://github.com/browser-use/video-use --skill video-useWhat is this skill?
- Phrase-level packed transcript (takes_packed.md) as the core derived artifact; other tags derived at decision time
- Hard workflow: ask questions, confirm plan in plain English, then execute—never cut before confirmation
- Audio-first editing: speech boundaries and silence gaps drive cuts; visuals drilled at decision points only
- Covers transcribe, cut, color grade, overlay animations, and subtitle burn for varied formats (tutorials, interviews, tr
- Production-correctness rules are mandatory; stylistic choices remain artistic freedom per material
- Core derived artifact: packed phrase-level transcript (takes_packed.md)
Adoption & trust: 926 installs on skills.sh; 9.3k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have raw footage and a goal edit but NLE presets and agent guesswork risk wrong cuts before you agree on strategy.
Who is it for?
Indie creators and founders who ship their own video marketing, courses, or demos via an agent and want hard confirmation gates plus transcript-driven editing.
Skip if: Pure code-only projects, batch automation without human creative sign-off, or workflows that require touching the timeline before the user confirms the plan in plain English.
When should I use this skill?
Edit any video by conversation when you need transcribe, cut, grade, overlays, or subtitles and must confirm the plan before executing.
What do I get? / Deliverables
You get an iterated, production-safe cut with subtitles and grades after an explicit user-confirmed plan, with persistent artifacts for the next revision pass.
- takes_packed.md phrase-level transcript
- Iterated cut with optional subtitles and color grade per confirmed plan
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Grow → content is the primary shelf because solo builders most often invoke this skill to ship polished video for audience and distribution, not to scaffold app code. Content subphase fits conversational post-production that produces publish-ready cuts, overlays, and burned subtitles for channels and landing pages.
Where it fits
Turn a 45-minute tutorial recording into a tight lesson with burned subtitles for YouTube.
Polish a product demo montage and color grade for a launch week social clip.
Cut a prototype screen recording into a 90-second validation video for early testers.
Finalize release notes companion video after confirming retake removal against the packed transcript.
How it compares
Skill workflow for conversational editorial discipline, not a preset-based SaaS video template generator or a single-shot ffmpeg one-liner.
Common Questions / FAQ
Who is video-use for?
Solo builders and small teams who edit marketing, tutorial, or interview video with AI agents and need structured ask-confirm-execute loops instead of blind cuts.
When should I use video-use?
In Grow content production for channel cuts, Launch distribution when polishing launch reels, Validate when trimming prototype demos, or Ship launch prep when finalizing demo assets—always after confirming the edit strategy.
Is video-use safe to install?
It implies filesystem and media processing on your projects; review the Security Audits panel on this Prism page before granting broad file or network access to agent tooling.
SKILL.md
READMESKILL.md - Video Use
# Video Use ## Principle 1. **LLM reasons from raw transcript + on-demand visuals.** The only derived artifact that earns its keep is a packed phrase-level transcript (`takes_packed.md`). Everything else — filler tagging, retake detection, shot classification, emphasis scoring — you derive at decision time. 2. **Audio is primary, visuals follow.** Cut candidates come from speech boundaries and silence gaps. Drill into visuals only at decision points. 3. **Ask → confirm → execute → iterate → persist.** Never touch the cut until the user has confirmed the strategy in plain English. 4. **Generalize.** Do not assume what kind of video this is. Look at the material, ask the user, then edit. 5. **Artistic freedom is the default.** Every specific value, preset, font, color, duration, pitch structure, and technique in this document is a *worked example* from one proven video — not a mandate. Read them to understand what's possible and why each worked. Then make your own taste calls based on what the material actually is and what the user actually wants. **The only things you MUST do are in the Hard Rules section below.** Everything else is yours. 6. **Invent freely.** If the material calls for a technique not described here — split-screen, picture-in-picture, lower-third identity cards, reaction cuts, speed ramps, freeze frames, crossfades, match cuts, L-cuts, J-cuts, speed ramps over breath, whatever — build it. The helpers are ffmpeg and PIL. They can do anything the format supports. Do not wait for permission. 7. **Verify your own output before showing it to the user.** If you wouldn't ship it, don't present it. ## Hard Rules (production correctness — non-negotiable) These are the things where deviation produces silent failures or broken output. They are not taste, they are correctness. Memorize them. 1. **Subtitles are applied LAST in the filter chain**, after every overlay. Otherwise overlays hide captions. Silent failure. 2. **Per-segment extract → lossless `-c copy` concat**, not single-pass filtergraph. Otherwise you double-encode every segment when overlays are added. 3. **30ms audio fades at every segment boundary** (`afade=t=in:st=0:d=0.03,afade=t=out:st={dur-0.03}:d=0.03`). Otherwise audible pops at every cut. 4. **Overlays use `setpts=PTS-STARTPTS+T/TB`** to shift the overlay's frame 0 to its window start. Otherwise you see the middle of the animation during the overlay window. 5. **Master SRT uses output-timeline offsets**: `output_time = word.start - segment_start + segment_offset`. Otherwise captions misalign after segment concat. 6. **Never cut inside a word.** Snap every cut edge to a word boundary from the Scribe transcript. 7. **Pad every cut edge.** Working window: 30–200ms. Scribe timestamps drift 50–100ms — padding absorbs the drift. Tighter for fast-paced, looser for cinematic. 8. **Word-level verbatim ASR only.** Never SRT/phrase mode (loses sub-second gap data). Never normalized fillers (loses editorial signal). 9. **Cache transcripts per source.** Never re-transcribe unless the source file itself changed. 10. **Parallel sub-agents for multiple animations.** Never sequential. Spawn N at once via the `Agent` tool; total wall time ≈ slowest one. 11. **Strategy confirmation before execution.** Never touch the cut until the user has approved the plain-English plan. 12. **All session outputs in `<videos_dir>/edit/`.** Never write inside the `video-use/` project directory. Everything else in this document is a worked example. Deviate whenever the material calls for it. ## Directory layout The skill lives in `video-use/`. User footage lives wherever t