Hyperframes

Name: Hyperframes
Author: heygen-com

heygen-com/hyperframes

286k installs
38.1k repo stars
Updated July 28, 2026
heygen-com/hyperframes

hyperframes is a video composition framework using HTML as source of truth with GSAP-driven animations, media timing, and audio reactivity.

About

HyperFrames skill for creating video compositions with HTML, GSAP animations, captions, and audio-reactive effects. Supports text-to-speech narration, caption syncing to audio, beat-synchronized animations, and shader-based scene transitions. Framework handles clip visibility, media playback, and timeline synchronization.

HTML as source of truth with GSAP animations and data-* timing attributes
Audio-reactive animation: frequency bands and amplitude mapped to GSAP properties
Caption syncing with tone-adaptive styling, per-word styling, and audio integration

Hyperframes by the numbers

285,550 all-time installs (skills.sh)
+42,999 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #39 of 1,340 Generative Media skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

hyperframes capabilities & compatibility

Use cases: video generation

From the docs

What hyperframes says it does

Audio-reactive animation: map frequency bands and amplitude to GSAP properties. Read when visuals should respond to music, voice, or sound.

SKILL.md

npx skills add https://github.com/heygen-com/hyperframes --skill hyperframes

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/heygen-com/hyperframes.svg)](https://skillselion.com/skills/heygen-com/hyperframes)

Installs	286k
repo stars	★ 38.1k
Security audit	2 / 3 scanners passed
Last updated	July 28, 2026
Repository	heygen-com/hyperframes ↗

How do you generate HTML video with synced captions?

Create animated video compositions with audio-reactive effects, caption syncing, and production-ready output.

Who is it for?

Video production workflows; animated explainer videos; music-reactive content; broadcast and social video creation.

Skip if: Live streaming; real-time video editing; non-linear editing tasks.

When should I use this skill?

User wants to create video compositions, add animations, sync captions to audio, or build audio-reactive visuals.

What you get

HyperFrames HTML compositions, timed scene transitions, synchronized captions, voiceover tracks, and audio-reactive animation layers.

Rendered MP4 video
HTML composition source
Timing and animation maps

Files

palettes/
references/
- transitions/
scripts/
templates/

SKILL.mdMarkdownGitHub ↗

HyperFrames

HTML is the source of truth for video. A composition is an HTML file with data-* attributes for timing, a GSAP timeline for animation, and CSS for appearance. The framework handles clip visibility, media playback, and timeline sync.

Approach

Discovery (exploratory requests only)

For open-ended requests ("make me a product launch video", "create something for our brand") where the user hasn't committed to a direction, understand intent before picking colors:

Audience — who watches this? Developers? Executives? General consumers?
Platform — where does it play? Social (15s), website hero, product demo, internal?
Priority — what matters most? Motion quality? Content accuracy? Brand fidelity? Speed?
Variations — does the user want options, or a single best shot?

For specific requests ("add a title card", "fix the timing on scene 3"), skip discovery.

For exploratory requests, consider offering 2-3 variations that differ meaningfully — not just color swaps, but different pacing, energy levels, or structural approaches. One safe/expected, one ambitious. Don't mandate this — it's a tool available when appropriate.

Step 1: Design system

If design.md or DESIGN.md exists in the project, read it first (check both casings — they're different files on Linux). It's the source of truth for brand colors, fonts, and constraints. Use its exact values — don't invent colors or substitute fonts. Any format works (YAML frontmatter, prose, tables — just extract the values).

If it names fonts you can't find locally (no fonts/ directory with .woff2 files, not a built-in font), warn the user before writing HTML: "design.md specifies [font name] but no font files found. Please add .woff2 files to fonts/ or I'll fall back to [closest built-in alternative]."

If no design.md exists, offer the user a choice:

1. User named a style or mood? → Read visual-styles.md for the 8 named presets. Pick the closest match. 2. Want to browse options visually? → Run the design picker: read references/design-picker.md for the full workflow. This serves a visual picker page. The user configures mood, palette, typography, and motion in the browser, then copies the generated design.md and pastes it back into the conversation. 3. Want to skip and go fast? → Ask: mood, light or dark, any brand colors/fonts? Then pick a palette from house-style.md.

design.md defines the brand. It does not define video composition rules. Those come from references/video-composition.md and house-style.md. Use brand colors at video-appropriate scale — not at web-UI opacity.

Step 2: Prompt expansion

Always run on every composition (except single-scene pieces and trivial edits). This step grounds the user's intent against design.md and house-style.md and produces a consistent intermediate that every downstream agent reads the same way.

Read references/prompt-expansion.md for the full process and output format.

Step 3: Plan

Before writing HTML, think at a high level:

1. What — what should the viewer experience? Identify the narrative arc, key moments, and emotional beats. 2. Structure — how many compositions, which are sub-compositions vs inline, what tracks carry what (video, audio, overlays, captions). 3. Rhythm — declare your scene rhythm before implementing. Which scenes are quick hits, which are holds, where do shaders land, where does energy peak. Name the pattern: fast-fast-SLOW-fast-SHADER-hold. Read references/beat-direction.md for rhythm templates. 4. Timing — which clips drive the duration, where do transitions land, what's the pacing. 5. Layout — build the end-state first. See "Layout Before Animation" below. 6. Animate — then add motion using the rules below.

Build what was asked. A request for "a title card" is not a request for "a title card + 3 supporting scenes + ambient music + captions." Every scene, every element, every tween should earn its place. If additional scenes or elements would genuinely improve the piece, propose them — don't add them.

For small edits (fix a color, adjust timing, add one element), skip straight to the rules.

<HARD-GATE> Before writing ANY composition HTML — verify you have a visual identity from Step 1. If you're reaching for #333, #3b82f6, or Roboto, you skipped it. </HARD-GATE>

Layout Before Animation

Position every element where it should be at its most visible moment — the frame where it's fully entered, correctly placed, and not yet exiting. Write this as static HTML+CSS first. No GSAP yet.

Why this matters: If you position elements at their animated start state (offscreen, scaled to 0, opacity 0) and tween them to where you think they should land, you're guessing the final layout. Overlaps are invisible until the video renders. By building the end state first, you can see and fix layout problems before adding any motion.

The process

1. Identify the hero frame for each scene — the moment when the most elements are simultaneously visible. This is the layout you build. 2. Write static CSS for that frame. The .scene-content container MUST fill the full scene using width: 100%; height: 100%; padding: Npx; with display: flex; flex-direction: column; gap: Npx; box-sizing: border-box. Use padding to push content inward — NEVER position: absolute; top: Npx on a content container. Absolute-positioned content containers overflow when content is taller than the remaining space. Reserve position: absolute for decoratives only. 3. Add entrances with `gsap.from()` — animate FROM offscreen/invisible TO the CSS position. The CSS position is the ground truth; the tween describes the journey to get there. (In sub-compositions loaded via data-composition-src, prefer gsap.fromTo() — see load-bearing GSAP rules in references/motion-principles.md.) 4. Add exits with `gsap.to()` — animate TO offscreen/invisible FROM the CSS position.

Example

/* scene-content fills the scene, padding positions content */
.scene-content {
  display: flex;
  flex-direction: column;
  justify-content: center;
  width: 100%;
  height: 100%;
  padding: 120px 160px;
  gap: 24px;
  box-sizing: border-box;
}
.title {
  font-size: 120px;
}
.subtitle {
  font-size: 42px;
}
/* Container fills any scene size (1920x1080, 1080x1920, etc).
   Padding positions content. Flex + gap handles spacing. */

WRONG — hardcoded dimensions and absolute positioning:

.scene-content {
  position: absolute;
  top: 200px;
  left: 160px;
  width: 1920px;
  height: 1080px;
  display: flex; /* ... */
}

// Step 3: Animate INTO those positions
tl.from(".title", { y: 60, opacity: 0, duration: 0.6, ease: "power3.out" }, 0);
tl.from(".subtitle", { y: 40, opacity: 0, duration: 0.5, ease: "power3.out" }, 0.2);
tl.from(".logo", { scale: 0.8, opacity: 0, duration: 0.4, ease: "power2.out" }, 0.3);

// Step 4: Animate OUT from those positions
tl.to(".title", { y: -40, opacity: 0, duration: 0.4, ease: "power2.in" }, 3);
tl.to(".subtitle", { y: -30, opacity: 0, duration: 0.3, ease: "power2.in" }, 3.1);
tl.to(".logo", { scale: 0.9, opacity: 0, duration: 0.3, ease: "power2.in" }, 3.2);

When elements share space across time

If element A exits before element B enters in the same area, both should have correct CSS positions for their respective hero frames. The timeline ordering guarantees they never visually coexist — but if you skip the layout step, you won't catch the case where they accidentally overlap due to a timing error.

What counts as intentional overlap

Layered effects (glow behind text, shadow elements, background patterns) and z-stacked designs (card stacks, depth layers) are intentional. The layout step is about catching unintentional overlap — two headlines landing on top of each other, a stat covering a label, content bleeding off-frame.

Data Attributes

All Clips

Attribute	Required	Values
`id`	Yes	Unique identifier
`data-start`	Yes	Seconds or clip ID reference (`"el-1"`, `"intro + 2"`)
`data-duration`	Required for img/div/compositions	Seconds. Video/audio defaults to media duration.
`data-track-index`	Yes	Integer. Same-track clips cannot overlap.
`data-media-start`	No	Trim offset into source (seconds)
`data-volume`	No	0-1 (default 1)

data-track-index does not affect visual layering — use CSS z-index.

Composition Clips

Attribute	Required	Values
`data-composition-id`	Yes	Unique composition ID
`data-start`	Yes	Start time (root composition: use `"0"`)
`data-duration`	Yes	Takes precedence over GSAP timeline duration
`data-width` / `data-height`	Yes	Pixel dimensions (1920x1080 or 1080x1920)
`data-composition-src`	No	Path to external HTML file
`data-variable-values`	No	JSON object of per-instance variable overrides on a sub-comp host

On the root <html> element:

Attribute	Required	Values
`data-composition-variables`	No	JSON array of declared variables (id/type/label/default) — drives Studio editing UI and provides defaults for `getVariables()`

Composition Structure

Sub-compositions loaded via data-composition-src use a <template> wrapper. Standalone compositions (the main index.html) do NOT use `<template>` — they put the data-composition-id div directly in <body>. Using <template> on a standalone file hides all content from the browser and breaks rendering.

Sub-composition structure:

<template id="my-comp-template">
  <div data-composition-id="my-comp" data-width="1920" data-height="1080">
    <!-- content -->
    <style>
      [data-composition-id="my-comp"] {
        /* scoped styles */
      }
    </style>
    <script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/gsap.min.js"></script>
    <script>
      window.__timelines = window.__timelines || {};
      const tl = gsap.timeline({ paused: true });
      // tweens...
      window.__timelines["my-comp"] = tl;
    </script>
  </div>
</template>

Load in root: <div id="el-1" data-composition-id="my-comp" data-composition-src="compositions/my-comp.html" data-start="0" data-duration="10" data-track-index="1"></div>

Variables (Parametrized Compositions)

Render the same composition with different content — title, theme color, prices, captions — without editing the source HTML.

Three-step pattern:

1. Declare variables on the composition's <html> root with data-composition-variables. Each entry needs id, type (one of string, number, color, boolean, enum), label, and default. Enum entries also need options: [{value, label}, ...]. 2. Read the resolved values inside the composition's script with window.__hyperframes.getVariables(). Returns the merged result of declared defaults + per-instance overrides + CLI overrides. 3. Override at render time with npx hyperframes render --variables '{...}' (top-level) or with data-variable-values='{...}' on the host element (per-instance for sub-comps).

<!doctype html>
<html
  data-composition-variables='[
  {"id":"title","type":"string","label":"Title","default":"Hello"},
  {"id":"theme","type":"enum","label":"Theme","default":"light","options":[
    {"value":"light","label":"Light"},
    {"value":"dark","label":"Dark"}
  ]}
]'
>
  <body>
    <div data-composition-id="root" data-width="1920" data-height="1080">
      <h1 id="hero" class="clip" data-start="0" data-duration="3"></h1>
      <script>
        const { title, theme } = window.__hyperframes.getVariables();
        document.getElementById("hero").textContent = title;
        document.body.dataset.theme = theme;
      </script>
    </div>
  </body>
</html>

# Dev preview uses declared defaults
npx hyperframes preview

# Render with overrides
npx hyperframes render --variables '{"title":"Q4 Report","theme":"dark"}' --output q4.mp4

# Or from a JSON file
npx hyperframes render --variables-file ./vars.json

Sub-composition per-instance values: the same getVariables() works inside sub-comps loaded via data-composition-src. Each host element passes its own values:

<div
  data-composition-id="card-pro"
  data-composition-src="compositions/card.html"
  data-variable-values='{"title":"Pro","price":"$29"}'
></div>
<div
  data-composition-id="card-enterprise"
  data-composition-src="compositions/card.html"
  data-variable-values='{"title":"Enterprise","price":"Custom"}'
></div>

The runtime layers each host's data-variable-values over the sub-comp's declared defaults on a per-instance basis, so the same source can be embedded multiple times with different content.

Rules of thumb:

Always provide a sensible default for every declared variable. Dev preview uses defaults — without them, the composition won't render correctly until --variables is provided.
Read variables once at the top of the script (const { title } = ...), not inside frame loops or event handlers — getVariables() allocates a fresh object per call.
Use --strict-variables in CI to fail fast on undeclared keys or type mismatches.
Variable types are validated at render time. string, number, boolean, and color (hex string) check typeof; enum checks the value is in the declared options.

Video and Audio

Video must be muted playsinline. Audio is always a separate <audio> element:

<video
  id="el-v"
  data-start="0"
  data-duration="30"
  data-track-index="0"
  src="video.mp4"
  muted
  playsinline
></video>
<audio
  id="el-a"
  data-start="0"
  data-duration="30"
  data-track-index="2"
  src="video.mp4"
  data-volume="1"
></audio>

Timeline Contract

All timelines start { paused: true } — the player controls playback
Register every timeline: window.__timelines["<composition-id>"] = tl
Framework auto-nests sub-timelines — do NOT manually add them
Duration comes from data-duration, not from GSAP timeline length
Never create empty tweens to set duration

Rules (Non-Negotiable)

Deterministic: No Math.random(), Date.now(), or time-based logic. Use a seeded PRNG if you need pseudo-random values (e.g. mulberry32).

GSAP: Only animate visual properties (opacity, x, y, scale, rotation, color, backgroundColor, borderRadius, transforms). Do NOT animate visibility, display, or call video.play()/audio.play().

Animation conflicts: Never animate the same property on the same element from multiple timelines simultaneously.

No `repeat: -1`: Infinite-repeat timelines break the capture engine. Calculate the exact repeat count from composition duration: repeat: Math.ceil(duration / cycleDuration) - 1.

Synchronous timeline construction: Never build timelines inside async/await, setTimeout, or Promises. The capture engine reads window.__timelines synchronously after page load. Fonts are embedded by the compiler, so they're available immediately — no need to wait for font loading.

Never do:

1. Forget window.__timelines registration 2. Use video for audio — always muted video + separate <audio> 3. Nest video inside a timed div — use a non-timed wrapper 4. Use data-layer (use data-track-index) or data-end (use data-duration) 5. Animate video element dimensions — animate a wrapper div 6. Call play/pause/seek on media — framework owns playback 7. Create a top-level container without data-composition-id 8. Use repeat: -1 on any timeline or tween — always finite repeats 9. Build timelines asynchronously (inside async, setTimeout, Promise) 10. Use gsap.set() on clip elements from later scenes — they don't exist in the DOM at page load. Use tl.set(selector, vars, timePosition) inside the timeline at or after the clip's data-start time instead. 11. Use <br> in content text — forced line breaks don't account for actual rendered font width. Text that wraps naturally + a <br> produces an extra unwanted break, causing overlap. Let text wrap via max-width instead. Exception: short display titles where each word is deliberately on its own line (e.g., "THE\nIMMORTAL\nGAME" at 130px).

Scene Transitions (Non-Negotiable)

Every multi-scene composition MUST follow ALL of these rules. Violating any one of them is a broken composition.

1. ALWAYS use transitions between scenes. No jump cuts. No exceptions. 2. ALWAYS use entrance animations on every scene. Every element animates IN via gsap.from(). No element may appear fully-formed. If a scene has 5 elements, it needs 5 entrance tweens. 3. NEVER use exit animations except on the final scene. This means: NO gsap.to() that animates opacity to 0, y offscreen, scale to 0, or any other "out" animation before a transition fires. The transition IS the exit. The outgoing scene's content MUST be fully visible at the moment the transition starts. 4. Final scene only: The last scene may fade elements out (e.g., fade to black). This is the ONLY scene where gsap.to(..., { opacity: 0 }) is allowed.

WRONG — exit animation before transition:

// BANNED — this empties the scene before the transition can use it
tl.to("#s1-title", { opacity: 0, y: -40, duration: 0.4 }, 6.5);
tl.to("#s1-subtitle", { opacity: 0, duration: 0.3 }, 6.7);
// transition fires on empty frame

RIGHT — entrance only, transition handles exit:

// Scene 1 entrance animations
tl.from("#s1-title", { y: 50, opacity: 0, duration: 0.7, ease: "power3.out" }, 0.3);
tl.from("#s1-subtitle", { y: 30, opacity: 0, duration: 0.5, ease: "power2.out" }, 0.6);
// NO exit tweens — transition at 7.2s handles the scene change
// Scene 2 entrance animations
tl.from("#s2-heading", { x: -40, opacity: 0, duration: 0.6, ease: "expo.out" }, 8.0);

Animation Guardrails

Offset first animation 0.1-0.3s (not t=0)
Vary eases across entrance tweens — use at least 3 different eases per scene
Don't repeat an entrance pattern within a scene
Avoid full-screen linear gradients on dark backgrounds (H.264 banding — use radial or solid + localized glow)
60px+ headlines, 20px+ body, 16px+ data labels for rendered video
font-variant-numeric: tabular-nums on number columns

If no design.md exists, follow house-style.md for aesthetic defaults.

Typography and Assets

Built-in fonts: Write the font-family you want in CSS — the compiler embeds supported fonts automatically.
Custom fonts: If design.md names a font that isn't built-in, the user must provide .woff2 files in a fonts/ directory. If missing, warn before writing HTML. When files exist, add @font-face declarations pointing to the local files.
Add crossorigin="anonymous" to external media
For dynamic text overflow, use window.__hyperframes.fitTextFontSize(text, { maxWidth, fontFamily, fontWeight })
All files live at the project root alongside index.html; sub-compositions use ../

Editing Existing Compositions

Read actual files, don't guess. When editing, extending, or creating companion compositions, read the existing source. Don't reconstruct hex codes from memory. Don't guess GSAP easing patterns. The composition IS the spec — extract exact values from it.
Match existing fonts, colors, animation patterns from what you read
Only change what was requested
Preserve timing of unrelated clips

Output Checklist

Fast (run immediately, block on results):

[ ] npx hyperframes lint and npx hyperframes validate both pass
[ ] Design adherence verified if design.md exists

Slow (run in parallel while presenting the preview to the user):

[ ] npx hyperframes inspect passes, or every reported overflow is intentionally marked
[ ] Contrast warnings addressed (see Quality Checks below)
[ ] Animation choreography verified (see Quality Checks below)

Quality Checks

Visual Inspect

hyperframes inspect runs the composition in headless Chrome, seeks through the timeline, and maps visual layout issues with timestamps, selectors, bounding boxes, and fix hints. Run it after lint and validate:

npx hyperframes inspect
npx hyperframes inspect --json

Failures usually mean text is spilling out of a bubble/card, a fixed-size label is clipping dynamic copy, or text has moved off the canvas. Fix by increasing container size or padding, reducing font size or letter spacing, adding a real max-width so text wraps inside the container, or using window.__hyperframes.fitTextFontSize(...) for dynamic copy.

Use --samples 15 for dense videos and --at 1.5,4,7.25 for specific hero frames. Repeated static issues are collapsed by default to avoid flooding agent context. If overflow is intentional for an entrance/exit animation, mark the element or ancestor with data-layout-allow-overflow. If a decorative element should never be audited, mark it with data-layout-ignore.

hyperframes layout is the compatibility alias for the same check.

Contrast

hyperframes validate runs a WCAG contrast audit by default. It seeks to 5 timestamps, screenshots the page, samples background pixels behind every text element, and computes contrast ratios. Failures appear as warnings:

⚠ WCAG AA contrast warnings (3):
  · .subtitle "secondary text" — 2.67:1 (need 4.5:1, t=5.3s)

If warnings appear:

On dark backgrounds: brighten the failing color until it clears 4.5:1 (normal text) or 3:1 (large text, 24px+ or 19px+ bold)
On light backgrounds: darken it
Stay within the palette family — don't invent a new color, adjust the existing one
Re-run hyperframes validate until clean

Use --no-contrast to skip if iterating rapidly and you'll check later.

Design Adherence

If a design.md exists, verify the composition follows it after authoring. Read the HTML and check:

1. Colors — every hex value in the composition appears in design.md's palette section (however the user labeled it: Colors, Palette, Theme, etc.). Flag any invented colors. 2. Typography — font families and weights match design.md's type spec. No substitutions. 3. Corners — border-radius values match the declared corner style, if specified. 4. Spacing — padding and gap values fall within the declared density range, if specified. 5. Depth — shadow usage matches the declared depth level, if specified (flat = none, subtle = light, layered = glows). 6. Avoidance rules — if design.md has a section listing things to avoid (commonly "What NOT to Do", "Don'ts", "Anti-patterns", or "Do's and Don'ts"), verify none are present.

Report violations as a checklist. Fix each one before serving.

If no design.md exists (house-style-only path), verify:

1. Palette consistency — the same bg, fg, and accent colors are used across all scenes. No per-scene color invention. 2. No lazy defaults — check the composition against house-style.md's "Lazy Defaults to Question" list. If any appear, they must be a deliberate choice for the content, not a default.

Animation Map

After authoring animations, run the animation map to verify choreography:

node skills/hyperframes/scripts/animation-map.mjs <composition-dir> \
  --out <composition-dir>/.hyperframes/anim-map

Outputs a single animation-map.json with:

Per-tween summaries: "#card1 animates opacity+y over 0.50s. moves 23px up. fades in. ends at (120, 200)"
ASCII timeline: Gantt chart of all tweens across the composition duration
Stagger detection: reports actual intervals ("3 elements stagger at 120ms")
Dead zones: periods over 1s with no animation — intentional hold or missing entrance?
Element lifecycles: first/last animation time, final visibility
Scene snapshots: visible element state at 5 key timestamps
Flags: offscreen, collision, invisible, paced-fast (under 0.2s), paced-slow (over 2s)

Read the JSON. Scan summaries for anything unexpected. Check every flag — fix or justify. Verify the timeline shows the intended choreography rhythm. Re-run after fixes.

Skip on small edits (fixing a color, adjusting one duration). Run on new compositions and significant animation changes.

---

References (loaded on demand)

[references/captions.md](references/captions.md) — Captions, subtitles, lyrics, karaoke synced to audio. Tone-adaptive style detection, per-word styling, text overflow prevention, caption exit guarantees, word grouping. Read when adding any text synced to audio timing.
[references/audio-reactive.md](references/audio-reactive.md) — Audio-reactive animation: map frequency bands and amplitude to GSAP properties. Read when visuals should respond to music, voice, or sound.
[references/css-patterns.md](references/css-patterns.md) — CSS+GSAP marker highlighting: highlight, circle, burst, scribble, sketchout. Deterministic, fully seekable. Read when adding visual emphasis to text.
[references/video-composition.md](references/video-composition.md) — Video-medium rules: density, color presence, scale, frame composition, design.md as brand not layout. Always read — these override web instincts.
[references/beat-direction.md](references/beat-direction.md) — Beat planning: concept, mood, choreography verbs, rhythm templates, transition decisions, depth layers. Always read for multi-scene compositions.
[references/typography.md](references/typography.md) — Typography: font pairing, OpenType features, dark-background adjustments, font discovery script. Always read — every composition has text.
[references/motion-principles.md](references/motion-principles.md) — Motion design principles, image motion treatment, load-bearing GSAP rules. Always read — every composition has motion.
[references/techniques.md](references/techniques.md) — 11 visual techniques with code patterns: SVG drawing, Canvas 2D, CSS 3D, kinetic type, Lottie, video compositing, typing effect, variable fonts, MotionPath, velocity transitions, audio-reactive. Read when planning techniques per beat.
[references/narration.md](references/narration.md) — Pacing, tone, script structure, number pronunciation, opening line patterns. Read when the composition includes voiceover or TTS.
[references/design-picker.md](references/design-picker.md) — Create a design.md via visual picker. Read when no design.md exists and the user wants to create one.
[visual-styles.md](visual-styles.md) — 8 named visual styles with hex palettes, GSAP easing signatures, and shader pairings. Read when user names a style or when generating design.md.
[house-style.md](house-style.md) — Default motion, sizing, and color palettes when no design.md is specified.
[patterns.md](patterns.md) — PiP, title cards, slide show patterns.
[data-in-motion.md](data-in-motion.md) — Data, stats, and infographic patterns.
[references/transcript-guide.md](references/transcript-guide.md) — Caption-side transcript handling: input formats, mandatory quality check, cleaning JS, OpenAI/Groq API fallback, "if no transcript exists" flow. (For the transcribe CLI invocation, model selection rules, and the .en gotcha, see the hyperframes-media skill.)
[references/dynamic-techniques.md](references/dynamic-techniques.md) — Dynamic caption animation techniques (karaoke, clip-path, slam, scatter, elastic, 3D).

[references/transitions.md](references/transitions.md) — Scene transitions: crossfades, wipes, reveals, shader transitions. Energy/mood selection, CSS vs WebGL guidance. Always read for multi-scene compositions — scenes without transitions feel like jump cuts.
transitions/catalog.md — Hard rules, scene template, and routing to per-type implementation code.
Shader transitions are in @hyperframes/shader-transitions (packages/shader-transitions/) — read package source, not skill files.

GSAP patterns and effects are in the /gsap skill.

House Style

Creative direction for compositions when no design.md is provided. These are starting points — override anything that doesn't serve the content. When a design.md exists, its brand values take precedence; house-style fills gaps.

Before Writing HTML

1. Interpret the prompt. Generate real content. A recipe lists real ingredients. A HUD has real readouts. 2. Pick a palette. Light or dark? Declare bg, fg, accent before writing code. 3. Pick typefaces. Run the font discovery script in references/typography.md — or pick a font you already know that fits the theme. The script broadens your options; it's not the only source.

Lazy Defaults to Question

These patterns are AI design tells — the first thing every LLM reaches for. If you're about to use one, pause and ask: is this a deliberate choice for THIS content, or am I defaulting?

Gradient text (background-clip: text + gradient)
Left-edge accent stripes on cards/callouts
Cyan-on-dark / purple-to-blue gradients / neon accents
Pure #000 or #fff (tint toward your accent hue instead)
Identical card grids (same-size cards repeated)
Everything centered with equal weight (lead the eye somewhere)
Banned fonts (see references/typography.md for full list)

If the content genuinely calls for one of these — centered layout for a solemn closing, cards for a real product UI mockup, a banned font because it's the perfect thematic match — use it. The goal is intentionality, not avoidance.

Color

Match light/dark to content: food, wellness, kids → light. Tech, cinema, finance → dark.
One accent hue. Same background across all scenes.
Tint neutrals toward your accent (even subtle warmth/coolness beats dead gray).
Contrast: enforced by hyperframes validate (WCAG AA). Text must be readable with decoratives removed.
Declare palette up front. Don't invent colors per-element.

Background Layer

Every scene needs visual depth — persistent decorative elements that stay visible while content animates in. Without these, scenes feel empty during entrance staggering.

Ideas (mix and match, 2-5 per scene):

Radial glows (accent-tinted, low opacity, breathing scale)
Ghost text (theme words at 3-8% opacity, very large, slow drift)
Accent lines (hairline rules, subtle pulse)
Grain/noise overlay, geometric shapes, grid patterns
Thematic decoratives (orbit rings for space, vinyl grooves for music, grid lines for data)

All decoratives should have slow ambient GSAP animation — breathing, drift, pulse. Static decoratives feel dead.

Decorative count vs motion count. The "2-5 per scene" count refers to decorative _elements_. If a project's design.md says "single ambient motion per scene", it means one looping motion applied to these decoratives (a shared breath/drift/pulse) — not one element total. A scene with 4 decoratives sharing one breathing motion is correct; a scene with 1 decorative is under-dressed.

Motion

See references/motion-principles.md for full rules. Quick: 0.3–0.6s, vary eases, combine transforms on entrances, overlap entries.

Typography

See references/typography.md for full rules. Quick: 700-900 headlines / 300-400 body, serif + sans (not two sans), 60px+ headlines / 20px+ body.

Palettes

Declare one background, one foreground, one accent before writing HTML.

Category	Use for	File
Bold / Energetic	Product launches, social media, announcements	palettes/bold-energetic.md
Warm / Editorial	Storytelling, documentaries, case studies	palettes/warm-editorial.md
Dark / Premium	Tech, finance, luxury, cinematic	palettes/dark-premium.md
Clean / Corporate	Explainers, tutorials, presentations	palettes/clean-corporate.md
Nature / Earth	Sustainability, outdoor, organic	palettes/nature-earth.md
Neon / Electric	Gaming, tech, nightlife	palettes/neon-electric.md
Pastel / Soft	Fashion, beauty, lifestyle, wellness	palettes/pastel-soft.md
Jewel / Rich	Luxury, events, sophisticated	palettes/jewel-rich.md
Monochrome	Dramatic, typography-focused	palettes/monochrome.md

Or derive from OKLCH — pick a hue, build bg/fg/accent at different lightnesses, tint everything toward that hue.

Bold / Energetic

Product launches, social media, announcements, high-energy content.

#FFBE0B #FB5607 #FF006E #8338EC #3A86FF
#F72585 #7209B7 #3A0CA3 #4361EE #4CC9F0
#EF476F #FFD166 #06D6A0 #118AB2 #073B4C
#FF595E #FFCA3A #8AC926 #1982C4 #6A4C93
#9B5DE5 #F15BB5 #FEE440 #00BBF9 #00F5D4
#390099 #9E0059 #FF0054 #FF5400 #FFBD00
#3D348B #7678ED #F7B801 #F18701 #F35B04
#FFBC42 #D81159 #8F2D56 #218380 #73D2DE

Clean / Corporate

Explainers, tutorials, presentations, professional content.

#FFFCF2 #CCC5B9 #403D39 #252422 #EB5E28
#22223B #4A4E69 #9A8C98 #C9ADA7 #F2E9E4
#3D5A80 #98C1D9 #E0FBFC #EE6C4D #293241
#2B2D42 #8D99AE #EDF2F4 #EF233C #D90429
#353535 #3C6E71 #FFFFFF #D9D9D9 #284B63
#E7ECEF #274C77 #6096BA #A3CEF1 #8B8C89
#CFDBD5 #E8EDDF #F5CB5C #242423 #333533
#2F6690 #3A7CA5 #D9DCD6 #16425B #81C3D7

Dark / Premium

Tech, finance, luxury, cinematic content.

#000000 #14213D #FCA311 #E5E5E5 #FFFFFF
#000814 #001D3D #003566 #FFC300 #FFD60A
#0D1B2A #1B263B #415A77 #778DA9 #E0E1DD
#0D1321 #1D2D44 #3E5C76 #748CAB #F0EBD8
#011627 #FDFFFC #2EC4B6 #E71D36 #FF9F1C
#0B090A #161A1D #660708 #A4161A #E5383B
#001427 #708D81 #F4D58D #BF0603 #8D0801
#001524 #15616D #FFECD1 #FF7D00 #78290F

Jewel / Rich

Luxury, events, sophisticated, high-end content.

#5F0F40 #9A031E #FB8B24 #E36414 #0F4C5C
#780000 #C1121F #FDF0D5 #003049 #669BBC
#10002B #240046 #3C096C #5A189A #7B2CBF
#355070 #6D597A #B56576 #E56B6F #EAAC8B
#6F1D1B #BB9457 #432818 #99582A #FFE6A7
#231942 #5E548E #9F86C0 #BE95C4 #E0B1CB
#461220 #8C2F39 #B23A48 #FCB9B2 #FED0BB
#780116 #F7B538 #DB7C26 #D8572A #C32F27

Monochrome

Dramatic, typography-focused, serious content.

#F8F9FA #E9ECEF #DEE2E6 #CED4DA #ADB5BD #6C757D #495057 #343A40 #212529
#0466C8 #0353A4 #023E7D #002855 #001233
#012A4A #013A63 #01497C #2A6F97 #468FAF #89C2D9
#582F0E #7F4F24 #936639 #A68A64 #C2C5AA
#463F3A #8A817C #BCB8B1 #F4F3EE #E0AFA0
#03071E #370617 #6A040F #9D0208 #DC2F02 #F48C06 #FFBA08
#590D22 #800F2F #A4133C #FF4D6D #FF8FA3 #FFCCD5
#220901 #621708 #941B0C #BC3908 #F6AA1C

Nature / Earth

Sustainability, outdoor, organic, wellness content.

#606C38 #283618 #FEFAE0 #DDA15E #BC6C25
#DAD7CD #A3B18A #588157 #3A5A40 #344E41
#386641 #6A994E #A7C957 #F2E8CF #BC4749
#CAD2C5 #84A98C #52796F #354F52 #2F3E46
#F0EAD2 #DDE5B6 #ADC178 #A98467 #6C584C
#132A13 #31572C #4F772D #90A955 #ECF39E
#6B9080 #A4C3B2 #CCE3DE #EAF4F4 #F6FFF8
#233D4D #FE7F2D #FCCA46 #A1C181 #619B8A

Neon / Electric

Gaming, tech, nightlife, Gen Z content.

#F72585 #B5179E #7209B7 #560BAD #3A0CA3
#70D6FF #FF70A6 #FF9770 #FFD670 #E9FF70
#7400B8 #6930C3 #5E60CE #5390D9 #48BFE3
#0B132B #1C2541 #3A506B #5BC0BE #6FFFE9
#540D6E #EE4266 #FFD23F #3BCEAC #0EAD69
#2D00F7 #6A00F4 #8900F2 #A100F2 #F20089
#FF6D00 #FF7900 #FF8500 #FF9100 #240046
#BBFBFF #8DD8FF #4E71FF #5409DA

Pastel / Soft

Fashion, beauty, lifestyle, wellness content.

#CDB4DB #FFC8DD #FFAFCC #BDE0FE #A2D2FF
#CCD5AE #E9EDC9 #FEFAE0 #FAEDCD #D4A373
#FFD6FF #E7C6FF #C8B6FF #B8C0FF #BBD0FF
#FFA69E #FAF3DD #B8F2E6 #AED9E0 #5E6472
#EDAFB8 #F7E1D7 #DEDBD2 #B0C4B1 #4A5759
#555B6E #89B0AE #BEE3DB #FAF9F9 #FFD6BA
#006D77 #83C5BE #EDF6F9 #FFDDD2 #E29578
#0081A7 #00AFB9 #FDFCDC #FED9B7 #F07167

Warm / Editorial

Storytelling, documentaries, case studies, narrative content.

#264653 #2A9D8F #E9C46A #F4A261 #E76F51
#335C67 #FFF3B0 #E09F3E #9E2A2B #540B0E
#F4F1DE #E07A5F #3D405B #81B29A #F2CC8F
#F6BD60 #F7EDE2 #F5CAC3 #84A59D #F28482
#003049 #D62828 #F77F00 #FCBF49 #EAE2B7
#588B8B #FFFFFF #FFD5C2 #F28F3B #C8553D
#283D3B #197278 #EDDDD4 #C44536 #772E25
#0D3B66 #FAF0CA #F4D35E #EE964B #F95738

Composition Patterns

Picture-in-Picture (Video in a Frame)

Animate a wrapper div for position/size. The video fills the wrapper. The wrapper has NO data attributes.

<div
  id="pip-frame"
  style="position:absolute;top:0;left:0;width:1920px;height:1080px;z-index:50;overflow:hidden;"
>
  <video
    id="el-video"
    data-start="0"
    data-duration="60"
    data-track-index="0"
    src="talking-head.mp4"
    muted
    playsinline
  ></video>
</div>

tl.to(
  "#pip-frame",
  { top: 700, left: 1360, width: 500, height: 280, borderRadius: 16, duration: 1 },
  10,
);
tl.to("#pip-frame", { left: 40, duration: 0.6 }, 30);

Text Behind Subject (transparent webm overlay)

Put a headline _behind_ a presenter so their silhouette occludes the text. Requires a transparent cutout produced by npx hyperframes remove-background presenter.mp4 -o presenter.webm.

Three layers, plus one critical rule:

<!-- z=1 base — full opaque mp4 (lobby + presenter), always visible -->
<video
  id="cf-base"
  data-start="0"
  data-duration="6"
  data-media-start="0"
  data-track-index="0"
  src="presenter.mp4"
  muted
  playsinline
></video>

<!-- z=2 headline — visible the whole time -->
<h1
  id="cf-headline"
  style="position:absolute;top:50%;left:50%;
     transform:translate(-50%,-50%); z-index:2; font-size:220px; font-weight:900;
     color:#fff; text-shadow:0 6px 32px rgba(0,0,0,.55); clip-path:inset(0 0 100% 0);"
>
  MAKE IT IN HYPERFRAMES
</h1>

<!-- z=3 cutout — same source, alpha around presenter, hidden until the cut -->
<!-- WRAPPER has the opacity, NOT the video itself (see rule below). -->
<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3;opacity:0">
  <video
    id="cf-cutout"
    data-start="0"
    data-duration="6"
    data-media-start="0"
    data-track-index="1"
    src="presenter.webm"
    muted
    playsinline
  ></video>
</div>

const tl = gsap.timeline({ paused: true });
const CUT = 3.3;

// Reveal headline early
tl.to("#cf-headline", { clipPath: "inset(0 0 0% 0)", duration: 0.6, ease: "expo.out" }, 0.25);

// At the cut, flip the cutout wrapper visible — the presenter's silhouette
// punches through the headline.
tl.set(".cutout-wrap", { opacity: 1 }, CUT);

// Sentinel: extend timeline to the composition's full duration so the
// renderer doesn't bail past the last meaningful tween.
tl.set({}, {}, 6);

window.__timelines["cover-flip"] = tl;

Why a wrapper div, not opacity on the video itself?

The framework forces opacity: 1 on any element with data-start/data-duration while it's "active" — that's how it manages clip lifecycles. A CSS opacity: 0 on the video element is silently overwritten. Wrap the video in a div with no data-* attributes; the wrapper is owned by your CSS/GSAP.

Why both videos at `data-start="0"`?

So both decode in sync from t=0. Late-mounting the cutout (data-start=3.3) makes Chrome do a seek + decoder warm-up at mount, which can land a frame off the base mp4 — visible as a one-frame jitter at the cut.

Color match: remove-background defaults to --quality balanced (crf 18) which keeps the cutout's RGB nearly identical to the source mp4 — minimal edge halo or color shift when overlaid. Use --quality best (crf 12) for hero shots; only drop to --quality fast (crf 30) when the cutout sits over a _different_ background and the size matters.

Title Card with Fade

<div
  id="title-card"
  data-start="0"
  data-duration="5"
  data-track-index="5"
  style="display:flex;align-items:center;justify-content:center;background:#111;z-index:60;"
>
  <h1 style="font-size:64px;color:#fff;opacity:0;">My Video Title</h1>
</div>

tl.to("#title-card h1", { opacity: 1, duration: 0.6 }, 0.3);
tl.to("#title-card", { opacity: 0, duration: 0.5 }, 4);

Slide Show with Section Headers

Use separate elements on the same track, each with its own time range. Slides auto-mount/unmount based on data-start/data-duration.

<div class="slide" data-start="0" data-duration="30" data-track-index="3">...</div>
<div class="slide" data-start="30" data-duration="25" data-track-index="3">...</div>
<div class="slide" data-start="55" data-duration="20" data-track-index="3">...</div>

Top-Level Composition Example

<div
  id="comp-1"
  data-composition-id="my-video"
  data-start="0"
  data-duration="60"
  data-width="1920"
  data-height="1080"
>
  <!-- Primitive clips -->
  <video
    id="el-1"
    data-start="0"
    data-duration="10"
    data-track-index="0"
    src="..."
    muted
    playsinline
  ></video>
  <video
    id="el-2"
    data-start="el-1"
    data-duration="8"
    data-track-index="0"
    src="..."
    muted
    playsinline
  ></video>
  <img id="el-3" data-start="5" data-duration="4" data-track-index="1" src="..." />
  <audio id="el-4" data-start="0" data-duration="30" data-track-index="2" src="..." />

  <!-- Sub-compositions loaded from files -->
  <div
    id="el-5"
    data-composition-id="intro-anim"
    data-composition-src="compositions/intro-anim.html"
    data-start="0"
    data-track-index="3"
  ></div>

  <div
    id="el-6"
    data-composition-id="captions"
    data-composition-src="compositions/caption-overlay.html"
    data-start="0"
    data-track-index="4"
  ></div>

  <script>
    // Just register the timeline — framework auto-nests sub-compositions
    const tl = gsap.timeline({ paused: true });
    window.__timelines["my-video"] = tl;
  </script>
</div>

Audio-Reactive Animation

Drive visuals from music, voice, or sound. Any GSAP-animatable property can respond to pre-extracted audio data.

Audio Data Format

var AUDIO_DATA = {
  fps: 30,
  totalFrames: 900,
  frames: [{ bands: [0.82, 0.45, 0.31, ...] }, ...]
};

frames[i].bands[] — frequency band amplitudes, 0-1. Index 0 = bass, higher = treble.
Each band normalized independently across the full track.

Mapping Audio to Visuals

Audio signal	Visual property	Effect
Bass (bands[0])	`scale`	Pulse on beat
Treble (bands[12-14])	`textShadow`, `boxShadow`	Glow intensity
Overall amplitude	`opacity`, `y`, `backgroundColor`	Breathe, lift, color shift
Mid-range (bands[4-8])	`borderRadius`, `width`	Shape morphing

Any GSAP-tweenable property works — clipPath, filter, SVG attributes, CSS custom properties.

Content, Not Medium

Audio provides timing and intensity. The visual vocabulary comes from the narrative.

Never add: equalizer bars, spectrum analyzers, waveform displays, musical notes clip art, generic particle systems, rainbow color cycling, strobing white on beats, abstract pulsing orbs.

Instead: Let content guide the visual and audio drive its behavior. Bass makes warmth _swell_. Treble sharpens _contrast_. The visual choice comes from "what does this piece feel like?"

Sampling Pattern

Audio reactivity requires per-frame sampling via a for loop with tl.call(), not a single tween:

// ✅ Correct — sample every frame
for (var f = 0; f < AUDIO_DATA.totalFrames; f++) {
  tl.call(
    (function (frame) {
      return function () {
        draw(frame);
      };
    })(AUDIO_DATA.frames[f]),
    [],
    f / AUDIO_DATA.fps,
  );
}

// ❌ Wrong — single tween, doesn't react to audio
gsap.to(".el", { scale: 1.2, duration: totalDuration });

Without per-frame sampling, the composition doesn't actually react to audio.

textShadow Gotcha

textShadow on a parent container with semi-transparent children (e.g., inactive caption words at rgba(255,255,255,0.3)) renders a visible glow rectangle behind all children. Fix: apply scale to the container for beat pulse, but apply textShadow to individual active words only.

Guidelines

Subtlety for text — 3-6% scale variation, soft glow. Heavy pulsing makes text unreadable.
Go bigger on non-text — backgrounds and shapes can handle 10-30% swings.
Match the energy — corporate = subtle; music video = dramatic.
Deterministic — pre-extracted data, no Web Audio API, no runtime analysis.

Constraints

All audio data must be pre-extracted (use extract-audio-data.py from the gsap skill's scripts/)
No Math.random() or Date.now()
Audio reactivity runs on the same GSAP timeline as everything else

Beat Direction

How to plan and direct individual scenes (beats) in a multi-scene composition. Read before writing any multi-scene video.

---

Per-Beat Direction

Each beat is a WORLD, not a layout. Before writing CSS specs and GSAP instructions, describe what the viewer EXPERIENCES. The difference between a great storyboard and a mediocre one:

Mediocre: "Dark navy background. '$1.9T' in white, 280px. Logo top-left. Wave image bottom-right." Great: "Camera is already mid-flight over a vast dark canvas. The gradient wave sweeps across the frame like aurora borealis — alive, shifting. '$1.9T' SLAMS into existence with such force the wave ripples in response. This isn't a slide — it's a moment."

The first describes pixels. The second describes an experience. Write the second, then figure out the pixels.

Each beat should have:

Concept

The big idea for this beat in 2-3 sentences. What visual WORLD are we in? What metaphor drives it? What should the viewer FEEL? This is the most important part — everything else flows from it.

Mood direction

Cultural and design references, not hex codes:

"Geometric, rhythmic, precise. Think Josef Albers or Bauhaus color studies."
"Warm workspace. Nice notebook energy, not technical blueprint."
"Cinematic title sequence. The kind of opening where you lean forward."

Animation choreography

Specific motion verbs per element — not "it animates in" but HOW:

Energy	Verbs	Example
High impact	SLAMS, CRASHES, PUNCHES, STAMPS, SHATTERS	"$1.9T" SLAMS in from left at -5°
Medium energy	CASCADE, SLIDES, DROPS, FILLS, DRAWS	Three cards CASCADE in staggered 0.3s
Low energy	types on, FLOATS, morphs, COUNTS UP, fades in	Counter COUNTS UP from 0 to 135K

Every element gets a verb. If you can't name the verb, the element is not yet designed.

Transition

How this beat hands off to the next. Specify the type and parameters.

When to pick which:

Choose shader transition for	Choose CSS transition for	Choose hard cut for
Reveals, big reaction shots, product/logo unveils, energy shifts, "wow" moments	Continuous camera-motion beats where the scene feels like one move broken into cuts	Rapid-fire lists, percussive edits on the beat, comedic timing
Any moment the music/VO punctuates with a downbeat or SFX hit	Beats that ease from one composition into the next with shared motion vocabulary	Sequences of 3+ quick tempo-matched switches
Brand moments where the transition itself _is_ the visual	Minimal/editorial pacing	Anytime a 0.3-0.8s transition would feel too slow

Rule of thumb: if the beat is the _centerpiece_ of the video, shader-transition into it. If the beat is connective tissue, CSS-transition. A brand reel of 5-7 beats usually wants 1-2 shader transitions (the hero reveal + the CTA) and the rest CSS or hard cuts — too many shader transitions flatten their impact.

CSS transitions (choose from skills/hyperframes/references/transitions/catalog.md):

Velocity-matched upward: exit y:-150, blur:30px, 0.33s power2.in → entry y:150→0, blur:30px→0, 1.0s power2.out
Whip pan: exit x:-400, blur:24px, 0.3s power3.in → entry x:400→0, blur:24px→0, 0.3s power3.out
Blur through: exit blur:20px, 0.3s → entry blur:20px→0, 0.25s power3.out
Zoom through: exit scale:1→1.2, blur:20px, 0.2s power3.in → entry scale:0.75→1, blur:20px→0, 0.5s expo.out
Hard cut / smash cut (for rapid-fire sequences)

Shader transitions (choose from packages/shader-transitions/README.md):

Cross-Warp Morph (organic, versatile) — 0.5-0.8s, power2.inOut
Cinematic Zoom (professional momentum) — 0.4-0.6s, power2.inOut
Gravitational Lens (otherworldly) — 0.6-1.0s, power2.inOut
Glitch (aggressive, high energy) — 0.3-0.5s
See packages/shader-transitions/README.md for the full API, available shaders, and setup

Depth layers

What's in foreground, midground, and background. Every beat should have at least 2 layers:

"BG: dark navy fill + subtle radial glow. MG: stat cards with drop shadow. FG: brand logo bottom-right."

SFX cues

What sounds at what moment:

"On the capture pulse — a soft, warm analog shutter click."
"Left side carries a faint low drone. On fold: drone cuts. Silence. Then a single clean chime."

---

Rhythm Planning

Before writing HTML, declare your scene rhythm: which scenes are quick hits, which are holds, where do shaders land, where does energy peak. Name the pattern — fast-fast-SLOW-fast-SHADER-hold — before implementing.

Video type	Typical rhythm pattern
Social ad (15s)	hook-PUNCH-hold-CTA
Product demo (30-60s)	slow-build-BUILD-PEAK-breathe-CTA
Launch teaser (10-20s)	SLAM-proof-SLAM-hold
Brand reel (20-45s)	drift-build-PEAK-drift-resolve

---

Velocity-Matched Transitions

Exit the outgoing beat with an accelerating ease (power2.in or power3.in) plus a blur ramp. Enter the incoming beat with a decelerating ease (power2.out or power3.out) plus blur clear. The fastest point of both easing curves meets at the cut — the viewer perceives continuous camera motion, not two discrete animations. Match exit velocity to entry velocity within ~5% tolerance.

Captions

Language Rule (Non-Negotiable)

Never use `.en` models unless the user explicitly states the audio is English. .en models TRANSLATE non-English audio into English instead of transcribing it.

1. User says the language → --model small --language <code> (no .en) 2. User says English → --model small.en 3. Language unknown → --model small (no .en, no --language) — auto-detects

---

Analyze spoken content to determine caption style. If user specifies a style, use that. Otherwise, detect tone from the transcript.

Transcript Source

[
  { "text": "Hello", "start": 0.0, "end": 0.5 },
  { "text": "world.", "start": 0.6, "end": 1.2 }
]

For transcription commands, whisper models, external APIs, see transcript-guide.md.

Style Detection (When No Style Specified)

Read the full transcript before choosing. Four dimensions:

1. Visual feel — corporate→clean; energetic→bold; storytelling→elegant; technical→precise; social→playful.

2. Color palette — dark+bright for energy; muted for professional; high contrast for clarity; one accent color.

3. Font mood — heavy/condensed for impact; clean sans for modern; rounded for friendly; serif for elegance.

4. Animation character — scale-pop for punchy; gentle fade for calm; word-by-word for emphasis; typewriter for technical.

Per-Word Styling

Scan for words deserving distinct treatment:

Brand/product names — larger size, unique color
ALL CAPS — scale boost, flash, accent color
Numbers/statistics — bold weight, accent color
Emotional keywords — exaggerated animation (overshoot, bounce)
Call-to-action — highlight, underline, color pop
Marker highlight — for beyond-color emphasis, see css-patterns.md

Script-to-Style Mapping

Tone	Font mood	Animation	Color	Size
Hype/launch	Heavy condensed, 800-900	Scale-pop, back.out(1.7), 0.1-0.2s	Bright on dark	72-96px
Corporate	Clean sans, 600-700	Fade+slide, power3.out, 0.3s	White/neutral, muted accent	56-72px
Tutorial	Mono/clean sans, 500-600	Typewriter/fade, 0.4-0.5s	High contrast, minimal	48-64px
Storytelling	Serif/elegant, 400-500	Slow fade, power2.out, 0.5-0.6s	Warm muted tones	44-56px
Social	Rounded sans, 700-800	Bounce, elastic.out, word-by-word	Playful, colored pills	56-80px

Word Grouping

High energy: 2-3 words. Quick turnover.
Conversational: 3-5 words. Natural phrases.
Measured/calm: 4-6 words. Longer groups.

Break on sentence boundaries, 150ms+ pauses, or max word count.

Positioning

Landscape (1920x1080): Bottom 80-120px, centered
Portrait (1080x1920): Lower middle ~600-700px from bottom, centered
Never cover the subject's face
position: absolute — never relative
One caption group visible at a time

Text Overflow Prevention

Use window.__hyperframes.fitTextFontSize():

var result = window.__hyperframes.fitTextFontSize(group.text.toUpperCase(), {
  fontFamily: "Outfit",
  fontWeight: 900,
  maxWidth: 1600,
});
el.style.fontSize = result.fontSize + "px";

Options: maxWidth (1600 landscape, 900 portrait), baseFontSize (78), minFontSize (42), fontWeight, fontFamily, step (2).

CSS safety nets: max-width on container, overflow: visible (not hidden — hidden clips scaled emphasis words and glow effects), position: absolute, explicit height. When per-word styling uses scale > 1.0, compute maxWidth = safeWidth / maxScale to leave headroom.

Container pattern: Full-width absolute container, centered. Do not use left: 50%; transform: translateX(-50%) — causes clipping at composition edges.

Caption Exit Guarantee

Every group must have a hard kill after exit animation:

tl.to(groupEl, { opacity: 0, scale: 0.95, duration: 0.12, ease: "power2.in" }, group.end - 0.12);
tl.set(groupEl, { opacity: 0, visibility: "hidden" }, group.end); // deterministic kill

Self-lint after building timeline — place before window.__timelines[id] = tl so it runs at composition init:

GROUPS.forEach(function (group, gi) {
  var el = document.getElementById("cg-" + gi);
  if (!el) return;
  tl.seek(group.end + 0.01);
  var computed = window.getComputedStyle(el);
  if (computed.opacity !== "0" && computed.visibility !== "hidden") {
    console.warn(
      "[caption-lint] group " + gi + " still visible at t=" + (group.end + 0.01).toFixed(2) + "s",
    );
  }
});
tl.seek(0);

Pre-Built Caption Components

Before building caption styles from scratch, check the registry — 15 ready-to-use caption components cover the most common styles. Install with npx hyperframes add <name> and use as a sub-composition via data-composition-src.

npx hyperframes catalog --tag caption-style   # list all caption components
npx hyperframes add caption-highlight         # install a specific one

Style	Component	Best for
TikTok-style highlight	`caption-highlight`	Social, high-energy
Karaoke pill	`caption-pill-karaoke`	Music, lyric videos
Cinematic editorial	`caption-editorial-emphasis`	Documentary, storytelling
Glitch / cyber	`caption-glitch-rgb`	Tech, gaming
Full-screen slam	`caption-kinetic-slam`	Hype, announcements
Neon glow	`caption-neon-glow`	Night, club, neon aesthetics
Neon accent (multi-color)	`caption-neon-accent`	Colorful, playful
Wipe reveal	`caption-clip-wipe`	Clean, modern
Gradient fill	`caption-gradient-fill`	Vibrant, eye-catching
Matrix decode	`caption-matrix-decode`	Sci-fi, tech reveals
Emoji pop	`caption-emoji-pop`	Social, casual
Parallax layers	`caption-parallax-layers`	Depth, cinematic
Particle burst	`caption-particle-burst`	Celebration, impact keywords
Lava texture	`caption-texture`	Bold, dramatic
Weight shift	`caption-weight-shift`	Elegant, typographic

Browse all with previews: hyperframes.heygen.com/catalog

Caption components ship with transparent backgrounds — they're pure overlays. If the underlying video is bright or busy, add a contrast layer (e.g. a semi-transparent dark div) in the host composition beneath the caption sub-composition, not inside the component itself.

Further References

dynamic-techniques.md — karaoke, clip-path reveals, slam words, scatter exits, elastic, 3D rotation
transcript-guide.md — transcription commands, whisper models, external APIs
css-patterns.md — CSS+GSAP marker highlighting (deterministic, fully seekable)

Constraints

Deterministic. No Math.random(), no Date.now().
Sync to transcript timestamps.
One group visible at a time.
Every group must have a hard tl.set kill at group.end.
The compiler embeds supported fonts automatically — just declare font-family in CSS.

CSS Patterns for Marker Highlighting

Pure CSS + GSAP implementations of all five MarkerHighlight.js drawing modes. Use these for deterministic rendering in HyperFrames compositions — no external library dependency, full GSAP timeline control.

1. Highlight Mode — Yellow marker sweep behind text
2. Circle Mode — Hand-drawn ellipse around text
3. Burst Mode — Radiating lines from text
4. Scribble Mode — Chaotic scribble over text
5. Sketchout Mode — Rough rectangle outline

1. Highlight Mode

Yellow marker sweep behind text. The most common mode.

<span class="mh-highlight-wrap">
  <span class="mh-highlight-bar" id="hl-1"></span>
  <span class="mh-highlight-text">highlighted text</span>
</span>

.mh-highlight-wrap {
  position: relative;
  display: inline;
}
.mh-highlight-bar {
  position: absolute;
  top: 0;
  left: -6px;
  right: -6px;
  bottom: 0;
  background: #fdd835;
  opacity: 0.35;
  transform: scaleX(0);
  transform-origin: left center;
  border-radius: 3px;
  z-index: 0;
}
.mh-highlight-text {
  position: relative;
  z-index: 1;
}

// Sweep in from left
tl.to("#hl-1", { scaleX: 1, duration: 0.5, ease: "power2.out" }, 0.6);

// Optional: skew for hand-drawn feel
// gsap.set("#hl-1", { skewX: -2 });

Multi-line Highlight

Stagger bars across multiple lines:

tl.to(
  ".mh-highlight-bar",
  {
    scaleX: 1,
    duration: 0.5,
    ease: "power2.out",
    stagger: 0.3,
  },
  0.6,
);

2. Circle Mode

Hand-drawn circle around text. Use border-radius: 50% with a slight rotation for organic feel.

<span class="mh-circle-wrap">
  <span class="mh-circle-text" id="circle-word">IMPORTANT</span>
  <span class="mh-circle-ring" id="circle-1"></span>
</span>

.mh-circle-wrap {
  position: relative;
  display: inline;
}
.mh-circle-text {
  position: relative;
  z-index: 1;
}
.mh-circle-ring {
  position: absolute;
  top: 50%;
  left: 50%;
  width: 130%;
  height: 160%;
  transform: translate(-50%, -50%) rotate(-3deg) scale(0);
  border: 3px solid #e53935;
  border-radius: 50%;
  pointer-events: none;
  z-index: 0;
}

// Circle scales in with a wobble
tl.to(
  "#circle-1",
  {
    scale: 1,
    rotation: -3,
    duration: 0.6,
    ease: "back.out(1.7)",
    transformOrigin: "center center",
  },
  0.7,
);

Variations

/* Tighter circle (for short words) */
.mh-circle-ring.tight {
  width: 150%;
  height: 180%;
}

/* Squared circle (rounded rectangle) */
.mh-circle-ring.rounded {
  border-radius: 30%;
  width: 120%;
  height: 140%;
}

/* Ellipse (wider than tall) */
.mh-circle-ring.ellipse {
  width: 150%;
  height: 130%;
  border-radius: 50%;
}

3. Burst Mode

Radiating lines from text center. Each line is a positioned div rotated to its angle.

<span class="mh-burst-wrap">
  <span class="mh-burst-text">WOW</span>
  <span class="mh-burst-container" id="burst-1">
    <span class="mh-burst-line" style="--angle: 0deg; --len: 70px;"></span>
    <span class="mh-burst-line" style="--angle: 30deg; --len: 55px;"></span>
    <span class="mh-burst-line" style="--angle: 60deg; --len: 80px;"></span>
    <span class="mh-burst-line" style="--angle: 90deg; --len: 45px;"></span>
    <span class="mh-burst-line" style="--angle: 120deg; --len: 65px;"></span>
    <span class="mh-burst-line" style="--angle: 150deg; --len: 75px;"></span>
    <span class="mh-burst-line" style="--angle: 180deg; --len: 50px;"></span>
    <span class="mh-burst-line" style="--angle: 210deg; --len: 60px;"></span>
    <span class="mh-burst-line" style="--angle: 240deg; --len: 80px;"></span>
    <span class="mh-burst-line" style="--angle: 270deg; --len: 40px;"></span>
    <span class="mh-burst-line" style="--angle: 300deg; --len: 70px;"></span>
    <span class="mh-burst-line" style="--angle: 330deg; --len: 55px;"></span>
  </span>
</span>

.mh-burst-wrap {
  position: relative;
  display: inline;
}
.mh-burst-text {
  position: relative;
  z-index: 2;
}
.mh-burst-container {
  position: absolute;
  top: 50%;
  left: 50%;
  width: 0;
  height: 0;
  z-index: 1;
}
.mh-burst-line {
  position: absolute;
  display: block;
  width: 3px;
  height: var(--len);
  background: #1e88e5;
  left: -1.5px;
  top: calc(-1 * var(--len));
  transform: rotate(var(--angle));
  transform-origin: bottom center;
  opacity: 0;
}

// All lines burst outward simultaneously with slight stagger
tl.fromTo(
  "#burst-1 .mh-burst-line",
  { scaleY: 0, opacity: 0 },
  { scaleY: 1, opacity: 1, duration: 0.4, ease: "power2.out", stagger: 0.03 },
  0.7,
);

Vary line lengths (40-80px range) for an organic, hand-drawn feel. Equal lengths look mechanical.

4. Scribble Mode

Wavy SVG underlines and strikethroughs that draw themselves via stroke-dashoffset.

<span class="mh-scribble-wrap">
  <span class="mh-scribble-text">underlined text</span>
  <svg class="mh-scribble-svg" viewBox="0 0 500 24" preserveAspectRatio="none">
    <path
      id="scribble-1"
      d="M0,12 Q31,0 62,12 Q93,24 125,12 Q156,0 187,12 Q218,24 250,12 Q281,0 312,12 Q343,24 375,12 Q406,0 437,12 Q468,24 500,12"
      fill="none"
      stroke="#FDD835"
      stroke-width="3"
      stroke-linecap="round"
    />
  </svg>
</div>

.mh-scribble-wrap {
  position: relative;
  display: inline;
}
.mh-scribble-text {
  position: relative;
  z-index: 1;
}
.mh-scribble-svg {
  position: absolute;
  left: 0;
  bottom: -6px;
  width: 100%;
  height: 24px;
  z-index: 0;
}

// Measure path length and set initial dash state
var path = document.querySelector("#scribble-1");
var len = path.getTotalLength();
gsap.set(path, { strokeDasharray: len, strokeDashoffset: len });

// Draw the line
tl.to(
  "#scribble-1",
  {
    strokeDashoffset: 0,
    duration: 0.8,
    ease: "power1.inOut",
  },
  0.7,
);

Strikethrough Variant

Position the SVG at top: 50%; transform: translateY(-50%) instead of bottom: -6px.

Wavy Path Generator

Scale the path's viewBox width to match text width. The wave pattern Q x1,y1 x2,y2 alternates between y=0 and y=24 for a natural wobble. Adjust the control points for tighter or looser waves:

Tight waves: smaller x-increments (25px per half-wave)
Loose waves: larger x-increments (50px per half-wave)
Amplitude: change the y range (0-24 for standard, 0-16 for subtle)

5. Sketchout Mode

Cross-hatch lines over de-emphasized text. Multiple angled lines create a "crossed out" effect.

<span class="mh-sketchout-wrap">
  <span class="mh-sketchout-text">old price</span>
  <span class="mh-sketchout-lines" id="sketchout-1">
    <span class="mh-sketchout-line mh-sketchout-fwd"></span>
    <span class="mh-sketchout-line mh-sketchout-bwd"></span>
  </span>
</span>

.mh-sketchout-wrap {
  position: relative;
  display: inline;
}
.mh-sketchout-text {
  position: relative;
  z-index: 0;
}
.mh-sketchout-lines {
  position: absolute;
  top: 0;
  left: -4px;
  right: -4px;
  bottom: 0;
  overflow: hidden;
  z-index: 1;
}
.mh-sketchout-line {
  position: absolute;
  display: block;
  top: 50%;
  left: 0;
  width: 100%;
  height: 2px;
  background: #e53935;
  transform-origin: left center;
  transform: scaleX(0);
}
.mh-sketchout-fwd {
  transform: scaleX(0) rotate(-12deg);
}
.mh-sketchout-bwd {
  transform: scaleX(0) rotate(12deg);
}

// Forward slash draws first
tl.to(
  "#sketchout-1 .mh-sketchout-fwd",
  {
    scaleX: 1,
    duration: 0.3,
    ease: "power2.out",
  },
  1.0,
);

// Backward slash follows
tl.to(
  "#sketchout-1 .mh-sketchout-bwd",
  {
    scaleX: 1,
    duration: 0.3,
    ease: "power2.out",
  },
  1.15,
);

Combining Modes in Captions

Use mode cycling for visual variety across caption groups:

var MODES = ["highlight", "circle", "burst", "scribble"];

GROUPS.forEach(function (group, gi) {
  var mode = MODES[gi % MODES.length];
  // Apply the mode's CSS pattern to emphasis words in this group
  group.emphasisWords.forEach(function (word) {
    applyMode(word.el, mode, tl, word.start);
  });
});

Cycle every 2-3 groups for high energy, every 3-4 for medium, every 4-5 for low.

Design Picker

Two-phase visual picker: mood boards first (pick a complete direction), then fine-tune individual categories.

Prerequisites

Read these before generating options — they define the rules your options must follow:

typography.md
../house-style.md
video-composition.md
../visual-styles.md
beat-direction.md

Building the picker

1. Generate options deeply contextual to the user's prompt. Every category — not just architectures — must reflect the specific product, brand, audience, and mood. Generic options that could appear on any picker are a failure.

Mood boards — as many as the creative space warrants (4-8). Every board must tell a different STORY about the brand, not just reshuffle the same elements. Ask: "what are the genuinely different ways to position this product?" A cat food brand might be: playful chaos, premium positioning, comfort/cozy, social-native, flavor showcase, humor-led, sensory/appetizing. Each is a different narrative, not a different font on the same layout.

Architectures — one per mood board minimum, each visually distinct. Use {{prompt_headline}} and {{prompt_sub}} tokens. If the user provided media assets, use them as background images (use url(path) without quotes — single quotes inside style='...' break the attribute).

Palettes (5-6) — named after the brand's world, not generic moods. The palette names and colors should feel like they belong to THIS specific product. Always mix dark + light + tinted. Every palette must be visually distinct at swatch size. If two palettes share the same background lightness AND a similar accent hue, cut one. Test: would a user see the difference in a 14px swatch chip? If not, they're duplicates.

Type pairings (5-6) — RUN the font discovery script from typography.md BEFORE generating pairings. This is not optional. Download Google Fonts metadata, run the script, and pick from its output. You will otherwise reach for the same 8 fonts every time (Bricolage Grotesque, Instrument Serif, Fraunces, Archivo Black, DM Serif Display, Space Grotesk, Fredoka) — that's your training data default, not a contextual choice. Match the brand's energy and audience. Cross-category per typography.md (never two sans-serifs).

2. mkdir -p .hyperframes then copy ../templates/design-picker.html to .hyperframes/pick-design.html. 3. Replace these placeholders using Python (don't hand-escape quotes in sed):

__ARCHITECTURES_JSON__ — array of architecture objects
__PALETTES_JSON__ — array of palette objects
__TYPEPAIRS_JSON__ — array of type pairing objects
__MOODBOARDS_JSON__ — array of mood board objects (see format below)
__PROMPT_JSON__ — object with prompt context (see format below)

Architecture data format

Each architecture object must include a preview_html field — the HTML that renders in the preview panel. Use token placeholders that the template replaces at runtime: {{bg}}, {{fg}}, {{ac}}, {{mt}}, {{hf}}, {{hw}}, {{bf}}, {{bw}}, {{cr}} (corner radius), {{pad}}, {{gap}}, {{shadow}}, {{g}} (grid line color), {{fg3}}/{{fg6}}/{{fg8}}/{{fg15}} (fg at opacity), {{ac3}}/{{ac5}}/{{ac25}} (accent at opacity).

Every token must be used. Apply {{cr}} to all cards, buttons, and containers. Apply {{shadow}} to elevated elements (cards, buttons, code blocks). Apply {{pad}} and {{gap}} to control spacing. If a token isn't used in the preview_html, that option will have no visible effect.

Density matters. Each architecture preview must include 15+ distinct elements to give the user a real sense of the layout. Include: headline, subhead, body paragraph, label/overline, stat with number, secondary stat, quote/testimonial, attribution, card with title+body, second card (different treatment), code/command block, primary button, secondary button, list or tags, accent divider/rule, and a data element (table row, progress bar, or chart).

Optionally include components (component styling rules) and dos (do's and don'ts) as strings — these appear in the generated design.md.

Layout constraint: All preview HTML must use percentage widths or max-width: 100%. Use flex-wrap: wrap on all flex rows. Absolute-positioned decoratives must stay within a parent with overflow: hidden.

Security: Architecture preview_html must not contain <script> tags, event handlers (onclick, onerror, etc.), or javascript: URLs. It is injected via innerHTML.

Image URLs: When using background images in preview_html, use url(path/to/image.jpg) WITHOUT quotes around the path. Single quotes like url('path.jpg') break because preview_html is inside a style='...' attribute — the inner single quotes terminate the outer attribute.

Palette variety: Always include a mix of light, dark, and tinted backgrounds across the 6 palettes — even for calm/wellness prompts.

Example architecture object

{
  "name": "Editorial Stack",
  "description": "Vertical rhythm with large type, pull quotes, and data callouts",
  "tag": "editorial / longform / narrative",
  "mood": "Confident, unhurried, typographically driven",
  "preview_html": "<div style='background:{{bg}};color:{{fg}};padding:{{pad}};min-height:100vh;font-family:\"{{bf}}\",sans-serif;font-weight:{{bw}};'><div style='max-width:100%;display:flex;flex-direction:column;gap:{{gap}};'><div style='font-size:10px;text-transform:uppercase;letter-spacing:0.12em;color:{{mt}};'>Overline Label</div><div style='font-family:\"{{hf}}\",serif;font-weight:{{hw}};font-size:48px;line-height:1.1;letter-spacing:-0.02em;'>The Headline Goes Here</div><div style='font-size:20px;color:{{mt}};max-width:70%;line-height:1.5;'>Subheading text that introduces the narrative arc of this composition with enough words to fill two lines.</div><div style='font-size:15px;line-height:1.7;color:{{fg}};max-width:65%;'>Body paragraph with real sentences. The quick brown fox jumps over the lazy dog. This gives a sense of text density and reading rhythm at the chosen type size.</div><div style='display:flex;gap:{{gap}};flex-wrap:wrap;'><div style='background:{{fg6}};border-radius:{{cr}};padding:{{pad}};flex:1;min-width:200px;box-shadow:{{shadow}};'><div style='font-size:36px;font-family:\"{{hf}}\",serif;font-weight:{{hw}};color:{{ac}};'>2.4M</div><div style='font-size:12px;color:{{mt}};margin-top:4px;'>Primary Stat</div></div><div style='background:{{fg6}};border-radius:{{cr}};padding:{{pad}};flex:1;min-width:200px;box-shadow:{{shadow}};'><div style='font-size:36px;font-family:\"{{hf}}\",serif;font-weight:{{hw}};color:{{fg}};'>87%</div><div style='font-size:12px;color:{{mt}};margin-top:4px;'>Secondary Stat</div></div></div><div style='border-left:3px solid {{ac}};padding:12px {{pad}};background:{{ac3}};border-radius:0 {{cr}} {{cr}} 0;'><div style='font-size:18px;font-style:italic;color:{{fg}};line-height:1.5;'>\"A pull quote that captures the key insight of the piece.\"</div><div style='font-size:12px;color:{{mt}};margin-top:8px;'>— Attribution Name</div></div><div style='background:{{fg3}};border-radius:{{cr}};padding:{{pad}};box-shadow:{{shadow}};'><div style='font-size:14px;font-weight:{{hw}};margin-bottom:8px;'>Card Title</div><div style='font-size:13px;color:{{mt}};line-height:1.5;'>Card body text with a different treatment than the main content area.</div></div><div style='background:{{ac5}};border:1px solid {{ac25}};border-radius:{{cr}};padding:{{pad}};box-shadow:{{shadow}};'><div style='font-size:14px;font-weight:{{hw}};color:{{ac}};margin-bottom:8px;'>Accent Card</div><div style='font-size:13px;color:{{fg}};line-height:1.5;'>Second card with a tinted accent treatment for variety.</div></div><div style='font-family:monospace;font-size:13px;background:{{fg8}};border-radius:{{cr}};padding:{{pad}};color:{{fg15}};box-shadow:{{shadow}};'>$ hyperframes render --output video.mp4</div><div style='display:flex;gap:12px;flex-wrap:wrap;'><button style='background:{{ac}};color:{{bg}};border:none;padding:10px 24px;border-radius:{{cr}};font-size:14px;font-weight:600;box-shadow:{{shadow}};cursor:pointer;'>Primary Action</button><button style='background:transparent;color:{{fg}};border:1px solid {{fg15}};padding:10px 24px;border-radius:{{cr}};font-size:14px;cursor:pointer;'>Secondary</button></div><div style='display:flex;gap:8px;flex-wrap:wrap;'><span style='background:{{fg6}};border-radius:100px;padding:4px 12px;font-size:11px;color:{{mt}};'>Tag One</span><span style='background:{{fg6}};border-radius:100px;padding:4px 12px;font-size:11px;color:{{mt}};'>Tag Two</span><span style='background:{{ac5}};border-radius:100px;padding:4px 12px;font-size:11px;color:{{ac}};'>Accent Tag</span></div><div style='height:1px;background:linear-gradient(to right,{{ac25}},{{fg6}},{{ac25}});'></div><div style='display:flex;justify-content:space-between;font-size:12px;color:{{mt}};border-bottom:1px solid {{g}};padding:8px 0;'><span>Data row label</span><span style='color:{{fg}};font-weight:600;'>1,234</span></div></div></div>"
}

Mood board data format

Each mood board pre-selects one option from each category. The user picks a mood board in Phase 1, then fine-tunes in Phase 2 with those selections pre-filled.

{
  "name": "Terminal Precision",
  "description": "Code-forward, data-dense, CLI energy. Dark canvas, monospace body, sharp corners.",
  "theme": "dark",
  "arch_index": 0,
  "palette_index": 0,
  "type_index": 0,
  "corners_index": 0,
  "density_index": 0,
  "depth_index": 1,
  "easing_index": 0,
  "corners": "0px",
  "padding": "12px",
  "gap": "8px",
  "shadow": "0 2px 16px rgba(0,230,255,0.15)"
}

Indices reference into the ARCHITECTURES, PALETTES, and TYPEPAIRS arrays. The template renders a mini preview of each mood board using its architecture's preview_html with the mood board's palette/type applied.

Prompt context data format

{
  "title": "AI Coding Assistant",
  "headline": "Your Code, Understood.",
  "subline": "An AI coding assistant that reads your entire codebase.",
  "section_desc": "Layout options for your product launch"
}

title appears in the Phase 1 header. headline and subline replace {{prompt_headline}} and {{prompt_sub}} in architecture preview_html so previews show real content.

Content tokens in preview_html

In addition to the standard design tokens ({{bg}}, {{fg}}, {{ac}}, etc.), architecture preview_html can use:

{{prompt_headline}} — the user's actual headline text
{{prompt_sub}} — the user's actual subline text

This makes previews contextual — the user sees their own content styled, not generic placeholders.

Serving and user selection

4. Serve the file: cd <project-dir> && python3 -m http.server 8723 & (use port 8723 or any unused port above 8000; if the curl check fails, try the next port). Verify: curl -s -o /dev/null -w "%{http_code}" http://localhost:8723/.hyperframes/pick-design.html — only share the link if it returns 200. Do NOT use npx hyperframes preview for the picker — it blocks. Only start the HTTP server from the main conversation thread. If you are running as a dispatched task or subagent, return the file path and let the caller serve it. 5. Once the user picks, tell them: "Copy the design.md from the picker and paste it here." The user pastes the markdown back into the conversation. Save it verbatim to design.md in the project root — it's already in spec format (YAML frontmatter + prose sections). After the user pastes, kill the background server: kill %1 or kill $(lsof -ti:8723). Then proceed with construction.

The picker outputs a google-labs-code/design.md spec-compliant file: YAML frontmatter with colors, typography, rounded, and spacing tokens, followed by ## Overview, ## Colors, ## Typography, ## Layout, ## Elevation, ## Components, and ## Do's and Don'ts prose sections.

Dynamic Caption Techniques

You are here because SKILL.md told you to read this file before writing animation code. Pick your technique combination from the table below based on the energy level you detected from the transcript, then implement using standard GSAP patterns.

Technique Selection by Energy

Energy level	Highlight	Exit	Cycle pattern
High	Karaoke with accent glow + scale pop	Scatter or drop	Alternate highlight styles every 2 groups
Medium-high	Karaoke with color pop	Scatter or collapse	Alternate every 3 groups
Medium	Karaoke (subtle, white only)	Fade + slide	Alternate every 3 groups
Medium-low	Karaoke (minimal scale change)	Fade	Single style, vary ease per group
Low	Karaoke (warm tones, slow transition)	Collapse	Alternate every 4 groups

All energy levels use karaoke highlight as the baseline. The difference is intensity — high energy gets accent color + glow + 15% scale pop on active words, low energy gets a gentle white shift with 3% scale.

Emphasis words always break the pattern. When a word is flagged as emphasis (emotional keyword, ALL CAPS, brand name), give it a stronger animation than surrounding words (larger scale, accent color, overshoot ease). This creates contrast.

Marker highlight modes add a visual layer on top of karaoke. For emphasis words that need more than color/scale, add a marker-style effect — highlight sweep, circle, burst, or scribble — using the /marker-highlight skill. Match mode to energy: burst for hype, circle for key terms, highlight for standard, scribble for subtle.

Audio-Reactive Captions (Mandatory for Music)

If the source audio is music (vocals over instrumentation, beats, any musical content), you MUST extract audio data and add audio-reactive animations. This is not optional — music without audio reactivity looks disconnected. Even low-energy ballads get subtle bass pulse and treble glow.

No special wiring is needed. The group loop already iterates over every caption group to build entrance, karaoke, and exit tweens. At that point, read the audio data for each group's time range and use it to modulate the group's animation intensity with regular GSAP tweens.

// Load audio data inline (same pattern as TRANSCRIPT)
var AUDIO = JSON.parse(audioDataJson); // { fps, totalFrames, frames: [{ bands: [...] }] }

GROUPS.forEach(function (group, gi) {
  var groupEl = document.getElementById("cg-" + gi);
  if (!groupEl) return;

  // Read peak energy for this group's time range
  var startFrame = Math.floor(group.start * AUDIO.fps);
  var endFrame = Math.min(Math.floor(group.end * AUDIO.fps), AUDIO.totalFrames - 1);
  var peakBass = 0;
  var peakTreble = 0;
  for (var f = startFrame; f <= endFrame; f++) {
    var frame = AUDIO.frames[f];
    if (!frame) continue;
    peakBass = Math.max(peakBass, frame.bands[0] || 0, frame.bands[1] || 0);
    peakTreble = Math.max(peakTreble, frame.bands[6] || 0, frame.bands[7] || 0);
  }

  // Modulate entrance — louder groups enter bigger and glowier
  tl.to(
    groupEl,
    {
      scale: 1 + peakBass * 0.06,
      textShadow:
        "0 0 " + Math.round(peakTreble * 12) + "px rgba(255,255,255," + peakTreble * 0.4 + ")",
      duration: 0.3,
      ease: "power2.out",
    },
    group.start,
  );

  // Reset at exit so audio-driven values don't persist
  tl.set(groupEl, { scale: 1, textShadow: "none" }, group.end - 0.15);
});

This shapes the animation at build time, not playback time — no per-frame callbacks, no tl.call() loops, no async fetch timing issues. Loud groups come in with more weight and glow; quiet groups come in soft. The audio data modulates _how much_, the content determines _what_.

Keep audio reactivity subtle — 3-6% scale variation and soft glow. Heavy pulsing makes text unreadable.

To generate the audio data file:

python3 skills/gsap-effects/scripts/extract-audio-data.py audio.mp3 --fps 30 --bands 8 -o audio-data.json

Combining Techniques

Don't use the same highlight animation on every group — cycle through styles using the group index. Don't combine multiple competing animations on the same word at the same timestamp. Vary techniques across groups to match the content's pace changes.

Marker highlight effects (from the /marker-highlight skill) layer well with karaoke — use karaoke for the word-by-word reveal, then add a marker effect on emphasis words only. For example: karaoke highlights each word in white, but brand names get a yellow highlight sweep and stats get a red circle. Cycle marker modes across groups for visual variety (see the mode-to-energy mapping in the marker-highlight skill).

Available Tools

These tools are available in the HyperFrames runtime. Use them when they solve a real problem — not every composition needs all of them.

Tool	What it does	Access	When it's useful
pretext	Pure-arithmetic text measurement without DOM reflow. 0.0002ms per call.	`window.__hyperframes.pretext.prepare(text, font)` / `.layout(prepared, maxWidth, lineHeight)`	Per-frame text reflow, shrinkwrap containers, computing layout before render
fitTextFontSize	Finds the largest font size that fits text on one line. Built on pretext.	`window.__hyperframes.fitTextFontSize(text, { maxWidth, fontFamily, fontWeight })`	Overflow prevention for long phrases, portrait mode, large base sizes
audio data	Pre-extracted per-frame RMS energy and frequency bands.	Extract with `extract-audio-data.py`, load inline or via `fetch("audio-data.json")`	Audio-reactive visuals — modulate intensity based on the music
GSAP	Animation timeline with tweens and callbacks.	`gsap.to()`, `gsap.set()`, `tl.to()`, `tl.set()`	All caption animation

Motion Principles

Guardrails

You know these rules but you violate them. Stop.

Don't use the same ease on every tween. You default to power2.out on everything. Vary eases like you vary font weights — no more than 2 independent tweens with the same ease in a scene.
Don't use the same speed on everything. You default to 0.4-0.5s for everything. The slowest scene should be 3× slower than the fastest. Vary duration deliberately.
Don't enter everything from the same direction. You default to y: 30, opacity: 0 on every element. Vary: from left, from right, from scale, opacity-only, letter-spacing.
Don't use the same stagger on every scene. Each scene needs its own rhythm.
Don't use ambient zoom on every scene. Pick different ambient motion per scene: slow pan, subtle rotation, scale push, color shift, or nothing. Stillness after motion is powerful.
Don't start at t=0. Offset the first animation 0.1-0.3s. Zero-delay feels like a jump cut.

What You Don't Do Without Being Told

Easing is emotion, not technique

The transition is the verb. The easing is the adverb. A slide-in with expo.out = confident. With sine.inOut = dreamy. With elastic.out = playful. Same motion, different meaning. Choose the adverb deliberately.

Direction rules — these are not optional:

.out for elements entering. Starts fast, decelerates. Feels responsive. This is your default.
.in for elements leaving. Starts slow, accelerates away. Throws them off.
.inOut for elements moving between positions.

You get this backwards constantly. Ease-in for entrances feels sluggish. Ease-out for exits feels reluctant.

Speed communicates weight

Fast (0.15-0.3s) — energy, urgency, confidence
Medium (0.3-0.5s) — professional, most content
Slow (0.5-0.8s) — gravity, luxury, contemplation
Very slow (0.8-2.0s) — cinematic, emotional, atmospheric

Scene structure: build / breathe / resolve

Every scene has three phases. You dump everything in the build and leave nothing for breathe or resolve.

Build (0-30%) — elements enter, staggered. Don't dump everything at once.
Breathe (30-70%) — content visible, alive with ONE ambient motion.
Resolve (70-100%) — exit or decisive end. Exits are faster than entrances.

Transitions are meaning

Crossfade = "this continues"
Hard cut = "wake up" / disruption
Slow dissolve = "drift with me"

You crossfade everything. Use hard cuts for disruption and register shifts.

Choreography is hierarchy

The element that moves first is perceived as most important. Stagger in order of importance, not DOM order. Don't wait for completion — overlap entries. Total stagger sequence under 500ms regardless of item count.

Asymmetry

Entrances need longer than exits. A card takes 0.4s to appear but 0.25s to disappear.

Visual Composition

You build for the web. Video frames are not pages.

Two focal points minimum per scene. The eye needs somewhere to travel. Never a single text block floating in empty space.
Fill the frame. Hero text: 60-80% of width. You will try to use web-sized elements. Don't.
Three layers minimum per scene. Background treatment (glow, oversized faded type, color panel). Foreground content. Accent elements (dividers, labels, data bars).
Background is not empty. Radial glows, oversized faded type bleeding off-frame, subtle border panels, hairline rules. Pure solid #000 reads as "nothing loaded."
Anchor to edges. Pin content to left/top or right/bottom. Centered-and-floating is a web pattern.
Split frames. Data panel on the left, content on the right. Top bar with metadata, full-width below. Zone-based layouts, not centered stacks.
Use structural elements. Rules, dividers, border panels. They create paths for the eye and animate well (scaleX from 0).

Image Motion Treatment

Never embed a raw flat image. Every image must have motion treatment:

Perspective tilt: use gsap.set(el, { transformPerspective: 1200, rotationY: -8 }) + box-shadow — creates depth. Do NOT use CSS transform: perspective(...) as GSAP will overwrite it.
Slow zoom (Ken Burns): GSAP scale: 1 → 1.04 over beat duration — makes photos cinematic
Device frame: Wrap in a laptop/phone shape using CSS border-radius and box-shadow
Floating UI: Extract a key element and animate it at a different z-depth for parallax
Scroll reveal: Clip the image to a viewport window and animate y position

Load-Bearing GSAP Rules

Rules below came out of two independent website-to-hyperframes builds (2026-04-20) where compositions lint-clean and still ship broken — elements that never appear, ambient motion that doesn't scrub, entrance tweens that silently kill their target. The linter cannot catch these; the rules must be followed by the author.

No iframes for captured content. Iframes do not seek deterministically with the timeline — the capture engine cannot scrub inside them, so they appear frozen (or blank) in the rendered output. If the source you're stylizing is a live web app, use the screenshots from capture/ as stacked panels or layered images, not live embeds.

Never stack two transform tweens on the same element. A common failure: a y entrance plus a scale Ken Burns on the same <img>. The second tween's immediateRender: true writes the element's initial state at construction time, overwriting whatever the first tween set — leaving the element invisible or offscreen with no lint warning. A secondary mechanism: tl.from() resets to its declared "from" state when the playhead is seeked past the timeline's end, so an element that looked correct in linear playback vanishes in the capture engine's non-linear seek. Fix one of two ways:

  <!-- BAD: two transforms on one element -->
  <img class="hero" src="..." />
  <script>
    tl.from(".hero", { y: 50, opacity: 0, duration: 0.6 }, 0);
    tl.to(".hero", { scale: 1.04, duration: beat }, 0); // kills the entrance
  </script>

  <!-- GOOD option A: combine into one tween -->
  <script>
    tl.fromTo(
      ".hero",
      { y: 50, opacity: 0, scale: 1.0 },
      { y: 0, opacity: 1, scale: 1.04, duration: beat, ease: "none" },
      0,
    );
  </script>

  <!-- GOOD option B: split across parent + child -->
  <div class="hero-wrap"><img class="hero" src="..." /></div>
  <script>
    tl.from(".hero-wrap", { y: 50, opacity: 0, duration: 0.6 }, 0); // entrance on parent
    tl.to(".hero", { scale: 1.04, duration: beat }, 0); // Ken Burns on child
  </script>

Prefer `tl.fromTo()` over `tl.from()` inside `.clip` scenes. gsap.from() sets immediateRender: true by default, which writes the "from" state at timeline construction — before the .clip scene's data-start is active. Elements can flash visible, start from the wrong position, or skip their entrance entirely when the scene is seeked non-linearly (which the capture engine does). Explicit fromTo makes the state at every timeline position deterministic:

  // BRITTLE: immediateRender interacts badly with scene boundaries
  tl.from(el, { opacity: 0, y: 50, duration: 0.6 }, t);

  // DETERMINISTIC: state is defined at both ends, no immediateRender surprise
  tl.fromTo(el, { opacity: 0, y: 50 }, { opacity: 1, y: 0, duration: 0.6 }, t);

Ambient pulses must attach to the seekable `tl`, never bare `gsap.to()`. Auras, shimmers, gentle float loops, logo breathing — all of these must be added to the scene's timeline, not fired standalone. Standalone tweens run on wallclock time and do not scrub with the capture engine, so the effect is absent in the rendered video even though it looks correct in the studio preview:

  // BAD: lives outside the timeline, never renders in capture
  gsap.to(".aura", { scale: 1.08, yoyo: true, repeat: 5, duration: 1.2 });

  // GOOD: seekable, deterministic, renders
  tl.to(".aura", { scale: 1.08, yoyo: true, repeat: 5, duration: 1.2 }, 0);

Hard-kill every scene boundary, not just captions. The caption hard-kill rule above generalizes: any element whose visibility changes at a beat boundary needs a deterministic tl.set() kill after its fade, because later tweens on the same element (or immediateRender from a sibling tween) can resurrect it. Apply to every element with an exit animation:

  tl.to(el, { opacity: 0, duration: 0.3 }, beatEnd);
  tl.set(el, { opacity: 0, visibility: "hidden" }, beatEnd + 0.3); // deterministic kill

These are the exact rules with the exact code examples — don't summarize or shorten them. They exist because compositions that lint clean still ship broken without them.

Narration & Script

How to write narration scripts for video compositions. Read when the composition includes voiceover or TTS.

Pacing

2.5 words per second is natural speaking pace
15s = ~37 words. 30s = ~75 words. 60s = ~150 words
Leave room for pauses. Silence between sentences is a feature, not dead air
The script should feel SHORTER than the video — visual breathing room matters

Tone

Write like a person, not a brochure:

Use contractions: "it's", "you'll", "that's", "we've"
Vary sentence length — short punchy phrases mixed with longer flowing ones
Read it out loud. If it sounds robotic, rewrite it
Avoid jargon unless the audience expects it

Number Pronunciation

Write what you want the voice to say. TTS reads literally.

In the product	Write in script as
135+	more than one hundred thirty five
$1.9T	nearly two trillion dollars
99.999%	ninety nine point nine percent
200M+	over two hundred million
10x	ten times
API	A P I
stripe.com	stripe dot com

The visual can show the exact figure while the voice rounds it.

Structure

For product videos:

1. Hook — what's surprising or impressive about this product? A bold claim, a provocative question, a contrast, or a striking number. This is the opening line. Vary the hook type — don't default to a stat every time. 2. Story — what does the product do? Who uses it? Keep it concrete. 3. Proof — stats, customer names, social proof. Real numbers from the product. 4. CTA — what should the viewer do? "Start building at stripe dot com."

Not every video needs all four. A 15-second social ad might be Hook + Proof + CTA. A 60-second product tour uses all four with more Story.

The Opening Line

The most important sentence in the video. It must create tension, curiosity, or surprise in the first 3 seconds.

Patterns that work:

A bold claim: "The financial infrastructure that powers the internet economy."
A question that provokes: "What if your database could think?"
A contrast: "Your AI agent already knows how to make videos. It just needs the right format."
A number that shocks: "Nearly two trillion dollars." (Use sparingly — not every video should open with a stat.)

If the opening is generic ("Welcome to Stripe" / "Introducing our product"), start over.

Example

From a 62-second product launch video (team reference):

Your AI agent already knows how to make videos.
It just needs the right format.

This is Hyperframes. An open source framework. HTML in, video out.

A div is a keyframe. Data attributes are your timeline.
CSS is your look. G-Sap is your animation engine.

Anything a browser can render can be a frame in your video.

CSS animations. G-Sap. Lottie. Shaders. Three.js.

Drop in music, sound effects, footage — it all composes together.

No new framework for the agent to learn.
Just HTML.

The agent writes it. The renderer captures every frame as MP4.
It's deterministic. Identical outputs, every time.

Give your agent the CLI. Tell it what to make.
Watch it build.

Hyperframes. Go make something.

Note: ~140 words for 62 seconds — that's 2.3 words/sec, leaving room for pauses and visual breathing.

Prompt Expansion

Run on every composition. Expansion is not about lengthening a short prompt — it's about grounding the user's intent against design.md and house-style.md and producing a consistent intermediate that every downstream agent reads the same way.

Runs AFTER design direction is established (Step 1). The expansion consumes design.md (if present) and produces output that cites its exact values.

Prerequisites

Read before generating:

design.md (if it exists) — extract brand colors, fonts, mood, and constraints. The expansion cites these exact values (hex codes, font names); it does not invent new ones.
beat-direction.md — per-beat planning format (concept, mood, choreography verbs, transitions, depth layers, rhythm). The expansion outputs each scene using this format.
video-composition.md — video-medium rules for density, scale, and color presence. The expansion applies these automatically.
../house-style.md — its rules for Background Layer (2-5 decoratives), Color, Motion, Typography apply to every scene. The expansion writes output that conforms to them.

If design.md doesn't exist yet, run Step 1 (Design system) first. Expansion without a design context produces generic scene breakdowns that later agents ignore.

Why always run it

The expansion is never pass-through. Every user prompt — no matter how detailed — is a _seed_. The expansion's job is to enrich it into a fully-realized per-scene production spec that the scene subagents can build from directly.

Even a detailed 7-scene brief lacks things only the expansion adds:

Atmosphere layers per scene (required 2–5 from house-style: radial glows, ghost type, hairline rules, grain, thematic decoratives) — the user's prompt almost never lists these; expansion adds them.
Secondary motion for every decorative — breath, drift, pulse, orbit. A decorative without ambient motion feels dead.
Micro-details that make a scene feel real — registration marks, tick indicators, monospace coord labels, typographic accents, code snippets in the background, grid patterns. Things the user didn't think to request.
Transition choreography at the object level — not "crossfade" but "X expands outward and becomes Y". Specific duration, ease, and morph source/target.
Pacing beats within each scene — where tension builds, where a hold lets the viewer breathe, where the accent word lands.
Exact hex values, typography parameters, ease choices from design.md — no vagueness left for the scene subagent to guess.

Expansion's job on a detailed prompt is not to summarize or pass through — it's to take what the user wrote and make it richer. The user's content stays; the atmosphere, ambient motion, and micro-details are added on top. That's what makes the difference between a scene that matches the brief and a scene that feels alive.

The quality gap between a single-pass composition and a multi-scene-pipeline composition comes from this step. Expansion front-loads the richness so every scene subagent builds from a rich brief, not a terse one.

Do not skip. Do not pass through. Single-scene compositions and trivial edits are the only exceptions.

What to generate

Expand into a full production prompt with these sections:

1. Title + style block — cite design.md's exact hex values, font names, and mood. Do NOT invent a palette — quote what the design provides.

2. Rhythm declaration — name the scene rhythm before detailing any scene. Example: hook-PUNCH-breathe-CTA or slow-build-BUILD-PEAK-breathe-CTA. See beat-direction.md for rhythm templates by video type.

3. Global rules — parallax layers, micro-motion requirements, transition style, primary + accent transitions. Match energy to mood (calm → slow eases, high → snappy eases).

4. Per-scene beats — for each scene, use the beat-direction format:

Concept — the big idea in 2-3 sentences. What visual WORLD? What metaphor? What should the viewer FEEL?
Mood direction — cultural/design references, not hex codes. ("Bauhaus color studies", "cinematic title sequence", "editorial calm")
Depth layers — BG (2-5 decoratives with ambient motion), MG (content), FG (accents, structural elements, micro-details). 8-10 total elements per scene per video-composition.md.
Animation choreography — specific verbs per element. High: SLAMS, CRASHES. Medium: CASCADE, SLIDES. Low: floats, types on, counts up. Every element gets a verb. If you can't name the verb, the element is not yet designed.
Transition out — shader or CSS, with specific type and parameters. Not "crossfade" but "blur crossfade, 0.4s, power2.inOut."

5. Recurring motifs — visual threads across scenes from the brand palette.

6. Negative prompt — what to avoid, informed by design.md's constraints if present.

Output

Write the expanded prompt to .hyperframes/expanded-prompt.md in the project directory. Do NOT dump it into the chat — it will be hundreds of lines.

Tell the user:

"I've expanded your prompt into a full production breakdown. Review it here: .hyperframes/expanded-prompt.md

It has [N] scenes across [duration] seconds with specific visual elements, transitions, and pacing. Edit anything you want, then let me know when you're ready to proceed."

Only move to construction after the user approves or says to continue.

Visual Techniques Reference

10 proven techniques from production HyperFrames videos. Use these in your storyboard and compositions to create visually rich, professional output. Each technique includes a minimal code pattern you can adapt.

These are NOT advanced — they're standard motion design patterns that every composition should use at least 2-3 of.

---

1. SVG Path Drawing

A path draws itself in real-time, like someone tracing with a pen. Use for revealing diagrams, arrows, connector lines, or brand marks.

<svg viewBox="0 0 400 200">
  <path
    class="draw-path"
    d="M 50 100 L 200 50 L 350 100"
    stroke="#c84f1c"
    stroke-width="4"
    fill="none"
    stroke-linecap="round"
  />
</svg>
<style>
  .draw-path {
    stroke-dasharray: 280;
    stroke-dashoffset: 280;
  }
</style>
<script>
  tl.to(".draw-path", { strokeDashoffset: 0, duration: 0.7, ease: "power2.out" }, 0.5);
</script>

Use path.getTotalLength() to calculate the dasharray value dynamically.

---

2. Canvas 2D Procedural Art

Animated noise, particle fields, data visualizations — anything that evolves frame-by-frame. Drive it with a GSAP proxy.

<canvas id="proc-canvas" width="1920" height="1080"></canvas>
<script>
  var canvas = document.getElementById("proc-canvas");
  var ctx = canvas.getContext("2d");

  function hash(x, y) {
    var n = x * 374761393 + y * 668265263;
    n = (n ^ (n >> 13)) * 1274126177;
    return ((n ^ (n >> 16)) & 0x7fffffff) / 0x7fffffff;
  }

  function drawFrame(t) {
    ctx.fillStyle = "#0a0a0a";
    ctx.fillRect(0, 0, 1920, 1080);
    for (var i = 0; i < 200; i++) {
      var x = hash(i, 0) * 1920;
      var y = hash(i, 1) * 1080;
      var brightness = hash(i, Math.floor(t * 10)) * 255;
      ctx.fillStyle = "rgba(255, 255, 255, " + brightness / 255 + ")";
      ctx.beginPath();
      ctx.arc(x, y, 2, 0, Math.PI * 2);
      ctx.fill();
    }
  }

  var proxy = { time: 0 };
  tl.to(
    proxy,
    {
      time: 5,
      duration: 5,
      ease: "none",
      onUpdate: function () {
        drawFrame(proxy.time);
      },
    },
    0,
  );
</script>

The hash() function is deterministic — same frame renders identically every time.

---

3. CSS 3D Transforms

Perspective rotations create depth. Use for product showcases, card flips, architectural reveals.

<div class="stage" style="perspective: 900px;">
  <div class="card-3d" style="transform-style: preserve-3d;">
    <div class="face front">Product</div>
    <div class="face back" style="transform: rotateY(180deg);">Details</div>
  </div>
</div>
<script>
  tl.to(".card-3d", { rotationY: 360, rotationX: 15, duration: 1.2, ease: "sine.inOut" }, 0);
</script>

Always set perspective on the parent, transform-style: preserve-3d on the animated element.

---

4. Per-Word Kinetic Typography

Words appear one-by-one, synced to transcript.json timestamps. The core technique for narration-driven videos.

<div class="headline">
  <span class="word w-0">Anything</span>
  <span class="word w-1">a</span>
  <span class="word w-2">browser</span>
  <span class="word w-3">can</span>
  <span class="word w-4">render</span>
</div>
<style>
  .word {
    display: inline-block;
    opacity: 0;
    margin: 0 0.12em;
  }
</style>
<script>
  // Word onset times from transcript.json (seconds relative to beat start)
  var timings = [0.0, 0.23, 0.28, 0.63, 0.78];
  var slides = [80, 60, 50, 25, 12]; // horizontal slide decay (px)

  document.querySelectorAll(".word").forEach(function (word, i) {
    tl.from(
      word,
      {
        x: slides[i],
        y: 14,
        opacity: 0,
        duration: 0.35,
        ease: "power2.out",
      },
      timings[i],
    );
  });
</script>

The slide distance DECAYS per word (80→12px) — mimics a camera settling.

---

5. Lottie Animation

Vector animations that play inside a composition. Use for logos, character animations, icons.

<script src="https://cdn.jsdelivr.net/npm/@dotlottie/player-component@2.7.12/dist/dotlottie-player.js"></script>
<dotlottie-player
  class="lottie"
  src="../capture/assets/lottie/animation-0.json"
  autoplay
  loop
  speed="1.5"
  style="width:500px;height:500px;"
>
</dotlottie-player>
<script>
  gsap.set(".lottie", { scale: 0.3, opacity: 0 });
  tl.to(".lottie", { scale: 1, opacity: 1, duration: 0.35, ease: "back.out(1.6)" }, 0.2);
</script>

Or use lottie-web for more control:

var anim = lottie.loadAnimation({
  container: document.getElementById("anim"),
  renderer: "svg",
  loop: false,
  autoplay: false,
  path: "../capture/assets/lottie/animation-0.json",
});

---

6. Video Compositing

Embed real video footage inside compositions. Videos must be muted with playsinline.

<div class="video-frame" style="width:680px;height:840px;border-radius:16px;overflow:hidden;">
  <video
    id="footage"
    src="../capture/assets/videos/clip.mp4"
    muted
    playsinline
    style="width:100%;height:100%;object-fit:cover;"
  ></video>
</div>
<script>
  // Video playback is controlled by the framework — don't call play() manually
  tl.from(".video-frame", { scale: 0.9, opacity: 0, duration: 0.3, ease: "power2.out" }, 0);
</script>

The HyperFrames runtime handles video seeking and playback.

---

7. Character-by-Character Typing

Terminal typing effect using tl.call() to update text content character by character.

<div class="terminal-line">
  <span class="prompt">❯</span>
  <span class="typed" id="typed-text"></span>
  <span class="cursor" style="width:11px;height:22px;background:#333;display:inline-block;"></span>
</div>
<script>
  var CMD = "npx hyperframes init";
  var typed = document.getElementById("typed-text");

  // Cursor blinks
  tl.to(".cursor", { opacity: 0, duration: 0.12, yoyo: true, repeat: 20, ease: "steps(1)" }, 0);

  // Type each character
  for (var i = 0; i < CMD.length; i++) {
    (function (idx) {
      tl.call(
        function () {
          typed.textContent = CMD.substring(0, idx + 1);
        },
        null,
        (idx / CMD.length) * 0.9,
      );
    })(i);
  }
</script>

Use ease: "steps(1)" for cursor blink — creates discrete on/off.

---

8. Variable Font Axis Animation

Animate font-variation-settings to reshape glyphs in real-time. Works with variable fonts that have axes like optical size (opsz), weight (wght), softness (SOFT).

<style>
  /* Load the captured local variable font — do NOT use Google Fonts @import.
     Replace this placeholder with an @font-face pointing to ../capture/assets/fonts/. */
  @font-face {
    font-family: "Fraunces";
    src: url("../capture/assets/fonts/Fraunces-Variable.woff2") format("woff2");
    font-weight: 100 900;
    font-style: normal;
    font-display: block;
  }
  .wordmark {
    --opsz: 144;
    --wght: 440;
    font-family: "Fraunces", serif;
    font-variation-settings:
      "opsz" var(--opsz),
      "wght" var(--wght);
    font-size: 200px;
  }
</style>
<script>
  tl.to(".wordmark", { "--opsz": 72, "--wght": 300, duration: 0.45, ease: "power2.out" }, 0);
</script>

The glyph subtly reshapes as axes animate — optical size adjusts detail, weight changes thickness.

---

9. GSAP MotionPathPlugin

Animate an element along an arbitrary SVG path. Use for sliders following curves, particles along trajectories, guided reveals.

<script src="https://cdn.jsdelivr.net/npm/gsap@3.14.2/dist/MotionPathPlugin.min.js"></script>
<div class="dot" style="width:20px;height:20px;background:#2a8a7c;border-radius:50%;"></div>
<script>
  gsap.registerPlugin(MotionPathPlugin);
  tl.to(
    ".dot",
    {
      motionPath: { path: "M 12 300 C 280 280 520 80 820 50 S 1200 48 1308 38" },
      duration: 1.5,
      ease: "power2.out",
    },
    0,
  );
</script>

---

10. Velocity-Matched Transitions

Exit one beat and enter the next with matched velocities — creates perceived continuous motion.

// EXIT (in outgoing composition): accelerating with blur
tl.to(
  ".content",
  {
    y: -150,
    filter: "blur(30px)",
    opacity: 0,
    duration: 0.33,
    ease: "power2.in", // accelerates
  },
  beatDuration - 0.33,
);

// ENTRY (in incoming composition): decelerating from blur
gsap.set(".content", { y: 150, filter: "blur(30px)" });
tl.to(
  ".content",
  {
    y: 0,
    filter: "blur(0px)",
    duration: 1.0,
    ease: "power2.out", // decelerates
  },
  0,
);

The fastest point of both curves meets at the cut — the viewer perceives smooth camera motion. Match ease families: .in for exits, .out for entries.

---

11. Audio-Reactive Animation

Drive any GSAP-tweenable property from the playing audio. Bass pulses a logo on kick drums. Treble glows a CTA on cymbals. Amplitude breathes a background during quiet phrases. The result: motion that feels locked to the track in a way pre-authored tweens never can.

When to use: Any video with music or dramatic narration — brand reels, product launches, hype edits. Skip for calm/tutorial pacing.

How it works: Pre-extract audio frequency bands into a JSON file, then sample per-frame via tl.call():

// audio-data.json: { fps: 30, totalFrames: 900, frames: [{ bands: [0.82, 0.45, 0.31, ...] }, ...] }
for (var f = 0; f < AUDIO_DATA.totalFrames; f++) {
  tl.call(
    (function (frame) {
      return function () {
        var bass = frame.bands[0]; // 0–1
        var treble = frame.bands[13];
        gsap.set(".logo", { scale: 1 + bass * 0.04 }); // 3–4% pulse on bass
        gsap.set(".cta", { filter: `drop-shadow(0 0 ${treble * 24}px #00C3FF)` });
      };
    })(AUDIO_DATA.frames[f]),
    [],
    f / AUDIO_DATA.fps,
  );
}

Per-frame sampling is required — a single tween will not react. Use the extract script:

python3 skills/gsap/scripts/extract-audio-data.py narration.wav --fps 30 --bands 16 -o audio-data.json

Keep text/logo intensity subtle (≤5% scale, ≤30% glow) — audio-reactive motion on tiny elements reads as jitter. Bigger backgrounds can push to 10–30%.

Never do: equalizer bars, spectrum analyzers, waveform displays, strobing, rainbow color cycling. The audio provides _timing and intensity_; the visual vocabulary still comes from the brand. See skills/hyperframes/references/audio-reactive.md for the full API and anti-patterns.

---

When to Use What

Video energy	Techniques to combine
High impact (launches, promos)	Per-word typography + velocity transitions + counter animations
Cinematic (tours, stories)	SVG path drawing + video compositing + 3D transforms
Technical (dev tools, APIs)	Character typing + Canvas 2D procedural + MotionPath
Premium (luxury, enterprise)	Variable font animation + Lottie + slow velocity transitions
Data-driven (stats, metrics)	Canvas 2D procedural + counter animations + SVG path drawing

Transcript Guide

For the transcribe CLI invocation, the .en-translates-non-English rule, and whisper model selection, see the hyperframes-media skill. This file covers what to do with the resulting transcript when authoring captions: input formats, mandatory quality checks, cleaning code, external-API fallbacks.

Supported Input Formats

The CLI auto-detects and normalizes these formats:

Format	Extension	Source	Word-level?
whisper.cpp JSON	`.json`	`hyperframes init --video`, `hyperframes transcribe`	Yes
OpenAI Whisper API	`.json`	`openai.audio.transcriptions.create({ timestamp_granularities: ["word"] })`	Yes
SRT subtitles	`.srt`	Video editors, subtitle tools, YouTube	No (phrase-level)
VTT subtitles	`.vtt`	Web players, YouTube, transcription services	No (phrase-level)
Normalized word array	`.json`	Pre-processed by any tool	Yes

Word-level timestamps produce better captions. SRT/VTT give phrase-level timing, which works but can't do per-word animation effects.

Transcript Quality Check (Mandatory)

After every transcription, read the transcript and check for quality issues before proceeding. Bad transcripts produce nonsensical captions. Never skip this step.

What to look for

Signal	Example	Cause
Music note tokens (`♪`, `�`)	`{ "text": "♪" }` or `{ "text": "�" }`	Whisper detected music, not speech
Garbled / nonsense words	"Do a chin", "Get so gay", "huh"	Model misheard lyrics or background noise
Long gaps with no words	20+ seconds of only `♪` tokens	Instrumental section — expected, but high ratio means speech is being missed
Repeated filler	Many "huh", "uh", "oh" entries	Model is hallucinating on music
Very short word spans	Words with `end - start < 0.05`	Unreliable timestamp alignment

Automatic retry rules

If more than 20% of entries are `♪`/`�` tokens, or the transcript contains obvious nonsense words, the transcription failed. Do not proceed with the bad transcript. Instead:

1. Retry with `medium.en` if the original used small.en or smaller:

   npx hyperframes transcribe audio.mp3 --model medium.en

2. If `medium.en` also fails (still >20% music tokens or garbled), tell the user the audio is too noisy for local transcription and suggest:

Providing lyrics manually as an SRT/VTT file
Using an external API (OpenAI or Groq Whisper — see below)

3. Always clean the transcript before building captions — filter out ♪/� tokens and entries where text is a single non-word character. Only real words should reach the caption composition.

Cleaning a transcript

After transcription (even with a good model), strip non-word entries:

var raw = JSON.parse(transcriptJson);
var words = raw.filter(function (w) {
  if (!w.text || w.text.trim().length === 0) return false;
  if (/^[♪�\u266a\u266b\u266c\u266d\u266e\u266f]+$/.test(w.text)) return false;
  if (/^(huh|uh|um|ah|oh)$/i.test(w.text) && w.end - w.start < 0.1) return false;
  return true;
});

When to use which model (decision tree)

1. Is this speech over silence/light background? → small.en is fine 2. Is this speech over music, or music with vocals? → Start with medium.en 3. Is this a produced music track (vocals + full instrumentation)? → Start with medium.en, expect to need manual lyrics or an external API 4. Is this multilingual? → Use medium or large-v3 (no .en suffix)

Using External Transcription APIs

For the best accuracy, use an external API and import the result:

OpenAI Whisper API (recommended for quality):

# Generate with word timestamps, then import
curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@audio.mp3 -F model=whisper-1 \
  -F response_format=verbose_json \
  -F "timestamp_granularities[]=word" \
  -o transcript-openai.json

npx hyperframes transcribe transcript-openai.json

Groq Whisper API (fast, free tier available):

curl https://api.groq.com/openai/v1/audio/transcriptions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -F file=@audio.mp3 -F model=whisper-large-v3 \
  -F response_format=verbose_json \
  -F "timestamp_granularities[]=word" \
  -o transcript-groq.json

npx hyperframes transcribe transcript-groq.json

If No Transcript Exists

1. Check the project root for transcript.json, .srt, or .vtt files 2. If none found, run transcription — pick the starting model based on the content type:

Speech/voiceover → small.en
Music with vocals → medium.en

   npx hyperframes transcribe <audio-or-video-file> --model medium.en

3. Read the transcript and run the quality check (see above). If it fails, retry with a larger model or suggest manual lyrics.

3D

3D Card Flip

180° Y-axis rotation. Requires CSS: backface-visibility: hidden; transform-style: preserve-3d; on both scene-inners. Parent needs perspective: 1200px.

tl.set(new, { rotationY: -180, opacity: 1 }, T);
tl.to(old, { rotationY: 180, duration: 0.6, ease: "power2.inOut" }, T);
tl.to(new, { rotationY: 0, duration: 0.6, ease: "power2.inOut" }, T);
tl.set(old, { opacity: 0 }, T + 0.6);

Blur

All blur transitions scale with energy. See SKILL.md "Blur Intensity by Energy" for the full table.

Blur Through

Content becomes fully abstract before resolving. The heaviest blur transition.

Calm (default for this type — it's inherently heavy):

tl.to(old, { filter: "blur(30px)", scale: 1.08, duration: 0.5, ease: "power1.in" }, T);
tl.to(old, { opacity: 0, duration: 0.3, ease: "power1.in" }, T + 0.3);
// Hold: both scenes in abstract blur state
tl.fromTo(new,
  { filter: "blur(30px)", scale: 0.92, opacity: 0 },
  { filter: "blur(30px)", scale: 0.92, opacity: 1, duration: 0.2, ease: "none" }, T + 0.5);
// Slow resolve
tl.to(new, { filter: "blur(0px)", scale: 1, duration: 0.7, ease: "power1.out" }, T + 0.7);

Medium:

tl.to(old, { filter: "blur(15px)", scale: 1.05, opacity: 0, duration: 0.4, ease: "power2.in" }, T);
tl.fromTo(new,
  { filter: "blur(15px)", scale: 0.95, opacity: 0 },
  { filter: "blur(0px)", scale: 1, opacity: 1, duration: 0.4, ease: "power2.out" }, T + 0.2);

Directional Blur

Blur + skew simulating motion in one direction. Scale blur and skew with energy.

Medium (default):

tl.to(old, { filter: "blur(12px)", skewX: -8, x: -200, opacity: 0, duration: 0.4, ease: "power3.in" }, T);
tl.fromTo(new,
  { filter: "blur(12px)", skewX: 8, x: 200, opacity: 0 },
  { filter: "blur(0px)", skewX: 0, x: 0, opacity: 1, duration: 0.4, ease: "power3.out" }, T + 0.15);

Calm (heavier blur, gentler motion):

tl.to(old, { filter: "blur(20px)", skewX: -4, x: -100, opacity: 0, duration: 0.6, ease: "power1.in" }, T);
tl.fromTo(new,
  { filter: "blur(20px)", skewX: 4, x: 100, opacity: 0 },
  { filter: "blur(0px)", skewX: 0, x: 0, opacity: 1, duration: 0.6, ease: "power1.out" }, T + 0.3);

Cover

Staggered Color Blocks

Full-screen (1920x1080) colored divs slide across staggered. Scene swaps while covered.

2-block (standard):

tl.set("#wipe-a", { x: -1920 }, T - 0.01);
tl.set("#wipe-b", { x: -1920 }, T - 0.01);
tl.to("#wipe-a", { x: 0, duration: 0.25, ease: "power3.inOut" }, T);
tl.to("#wipe-b", { x: 0, duration: 0.25, ease: "power3.inOut" }, T + 0.06);
tl.set(old, { opacity: 0 }, T + 0.2);
tl.set(new, { opacity: 1 }, T + 0.2);
tl.to("#wipe-a", { x: 1920, duration: 0.25, ease: "power3.inOut" }, T + 0.28);
tl.to("#wipe-b", { x: 1920, duration: 0.25, ease: "power3.inOut" }, T + 0.34);

5-block (dense variant): same pattern with 5 blocks at 0.04s stagger. Use composition palette colors.

Horizontal Blinds

Full-width strips slide across staggered. Each strip: width: 1920px; height: Xpx.

6 strips (180px each): 0.03s stagger 12 strips (90px each): 0.018s stagger

for (var i = 0; i < N; i++) {
  tl.set("#blind-h-" + i, { x: -1920 }, T - 0.01);
  tl.fromTo("#blind-h-" + i, { x: -1920 }, { x: 0, duration: 0.2, ease: "power3.inOut" }, T + i * stagger);
}
tl.set(old, { opacity: 0 }, T + coverTime);
tl.set(new, { opacity: 1 }, T + coverTime);
for (var i = 0; i < N; i++) {
  tl.to("#blind-h-" + i, { x: 1920, duration: 0.2, ease: "power3.inOut" }, T + exitStart + i * stagger);
}

Vertical Blinds

Same as horizontal but strips are tall and narrow, moving on Y axis.

Distortion

Glitch

RGB-tinted overlays (NOT multiply blend — use normal blending at 35% opacity) jitter with large offsets. Scene itself also jitters.

tl.set("#glitch-r", { opacity: 1, x: 40, y: -8 }, T);
tl.set("#glitch-g", { opacity: 1, x: -30, y: 12 }, T);
tl.set("#glitch-b", { opacity: 1, x: 15, y: -20 }, T);
tl.set(old, { x: -15 }, T);
// 6 jitter frames at 0.03s intervals with big offsets (±30-60px)
// ... swap and clear at T + 0.2

Chromatic Aberration

RGB overlays start aligned then spread apart (±80px), scene fades, converge on new scene.

tl.set("#glitch-r", { opacity: 0.6, x: 0 }, T);
tl.set("#glitch-g", { opacity: 0.6, x: 0 }, T);
tl.set("#glitch-b", { opacity: 0.6, x: 0 }, T);
tl.to("#glitch-r", { x: -80, opacity: 0.8, duration: 0.3, ease: "power2.in" }, T);
tl.to("#glitch-b", { x: 80, opacity: 0.8, duration: 0.3, ease: "power2.in" }, T);
tl.to("#glitch-g", { y: 30, duration: 0.3, ease: "power2.in" }, T);
// Swap at T + 0.3, converge back at T + 0.3

Ripple

Rapid oscillation (±30px) + scale distortion (0.97-1.03) + increasing blur. Swap at peak distortion.

tl.to(old, { x: 30, scale: 1.02, duration: 0.04, ease: "none" }, T);
tl.to(old, { x: -25, scale: 0.98, filter: "blur(4px)", duration: 0.04, ease: "none" }, T + 0.04);
// ... more oscillations with increasing blur
// Swap at peak, incoming stabilizes with decreasing wobble

VHS Tape

Clone scene into 20 horizontal strips (each 54px, clip-path'd). Each strip shifts x independently with seeded pseudo-random offsets at per-bar random intervals. Add red+blue chromatic offset copies on each strip (z-index above main, 35% opacity). Make strips wider than frame (2020px at left:-50px) so edges never show.

See SKILL.md for clone-based implementation pattern.

Related skills

Remotion Best PracticesGet Remotion-specific coding guidance that prevents common video rendering mistakes when creating animated React videos.442k4.1k

Remotion RenderGenerate high-quality MP4 videos from React code using Remotion inside an AI coding agent.363k648

Ai Video GenerationTurn written prompts into short videos using AI video generation models directly from Cursor or Claude.363k648

Ai Avatar VideoGenerate short talking-head videos of custom AI avatars from text prompts.363k648

Ai Image GenerationLet their coding agent generate, iterate on, and insert high-quality images directly into web apps, marketing assets, or product features.363k648

Video EditIntelligently route video editing requests to the best RunComfy model without trial-and-error.357k31

Forks & variants (1)

Hyperframes has 1 known copy in the catalog totaling 8 installs. They canonicalize to this original listing.

dirnbauer - 8 installs

How it compares

Choose hyperframes for HTML composition authoring; use hyperframes-cli for terminal scaffold, lint, and render workflows.

FAQ

What can hyperframes compose in HTML?

hyperframes builds video compositions with animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions—all authored as HyperFrames HTML with timing and media sync.

When should hyperframes-cli be used instead?

hyperframes covers composition authoring in HTML. Use hyperframes-cli for terminal dev-loop commands like init, lint, preview, and render, and hyperframes-media for tts and transcribe preprocessing.

Is Hyperframes safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Generative Mediacontentdistribution

About

Hyperframes by the numbers

hyperframes capabilities & compatibility

What hyperframes says it does

Add your badge

How do you generate HTML video with synced captions?

Who is it for?

When should I use this skill?

What you get

Files

HyperFrames

Approach

Discovery (exploratory requests only)

Step 1: Design system

Step 2: Prompt expansion

Step 3: Plan

Layout Before Animation

The process

Example

When elements share space across time

What counts as intentional overlap

Data Attributes

All Clips

Composition Clips

Composition Structure

Variables (Parametrized Compositions)

Video and Audio

Timeline Contract

Rules (Non-Negotiable)

Scene Transitions (Non-Negotiable)

Animation Guardrails

Typography and Assets

Editing Existing Compositions

Output Checklist

Quality Checks

Visual Inspect

Contrast

Design Adherence

Animation Map

References (loaded on demand)

Data in Motion

Visual Continuity

Numbers Need Visual Weight

Avoid Web Patterns

House Style

Before Writing HTML

Lazy Defaults to Question

Color

Background Layer

Motion

Typography

Palettes

Bold / Energetic

Clean / Corporate

Dark / Premium

Jewel / Rich

Monochrome

Nature / Earth

Neon / Electric

Pastel / Soft

Warm / Editorial

Composition Patterns

Picture-in-Picture (Video in a Frame)

Text Behind Subject (transparent webm overlay)

Title Card with Fade

Slide Show with Section Headers

Top-Level Composition Example

Audio-Reactive Animation

Audio Data Format

Mapping Audio to Visuals

Content, Not Medium

Sampling Pattern

textShadow Gotcha

Guidelines

Constraints

Beat Direction

Per-Beat Direction

Concept

Mood direction

Animation choreography