
Songsee
Turn local audio files into spectrogram and multi-panel feature visualizations from the terminal for demos, content, or ML inspection.
Overview
songsee is an agent skill for the Build phase that generates spectrograms and multi-panel audio feature visualizations via the songsee CLI.
Install
npx skills add https://github.com/steipete/clawdis --skill songseeWhat is this skill?
- Single-command spectrogram: `songsee track.mp3` with optional time slices via `--start` / `--duration`
- Multi-panel grids via repeatable or comma-separated `--viz` (spectrogram, mel, chroma, hpss, selfsim, loudness, tempogra
- Palette and sizing controls: `--style` (classic, magma, inferno, viridis, gray), `--width` / `--height`, FFT `--window`
- Stdin pipeline support: `cat track.mp3 | songsee - --format png -o out.png`
- WAV/MP3 native decode; other formats use ffmpeg when available
- 9 named viz types in the multi-panel example (spectrogram, mel, chroma, hpss, selfsim, loudness, tempogram, mfcc, flux)
- 5 palette styles (classic, magma, inferno, viridis, gray)
Adoption & trust: 1.8k installs on skills.sh; 378k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have audio files or streams but no quick, repeatable way to produce spectrogram and feature-panel images for docs, content, or debugging.
Who is it for?
Solo builders who need terminal-driven spectrograms and feature panels from MP3/WAV (or ffmpeg-backed formats) for content, demos, or ML sanity checks.
Skip if: Teams that need a GUI DAW, real-time playback UI, or cloud batch rendering without installing the songsee binary locally.
When should I use this skill?
You need spectrograms or stacked audio feature panels from files or stdin and want the agent to run songsee with the right flags.
What do I get? / Deliverables
After the skill runs, your agent emits correct songsee commands (and install steps) that write JPG/PNG spectrograms or viz grids to your chosen paths.
- Spectrogram or multi-viz grid image (JPG/PNG)
- Documented shell commands for reproducible renders
Recommended Skills
Journey fit
Canonical shelf is Build because the skill wires a host CLI (songsee) into agent workflows for producing media artifacts, not for validating ideas or shipping production services. Integrations fits best: it documents install (Homebrew), binary dependency, flags, and piping stdin—classic third-party CLI hookup for solo builders.
How it compares
Use instead of hand-rolling matplotlib/librosa scripts when you want a single maintained CLI with named viz presets.
Common Questions / FAQ
Who is songsee for?
Indie developers and creators who work in Claude Code, Cursor, or similar agents and want fast spectrogram images from local audio without writing visualization code each time.
When should I use songsee?
Use it in the Build phase when integrating tooling—e.g., generating slice images for a landing demo, documenting an audio ML pipeline, or batch-exporting panels during backend or agent-tooling work.
Is songsee safe to install?
The skill only documents invoking the external songsee CLI and optional ffmpeg; review the Security Audits panel on this Prism page and verify the Homebrew tap and binary source before installing on your machine.
SKILL.md
READMESKILL.md - Songsee
# songsee Generate spectrograms + feature panels from audio. Quick start - Spectrogram: `songsee track.mp3` - Multi-panel: `songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux` - Time slice: `songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg` - Stdin: `cat track.mp3 | songsee - --format png -o out.png` Common flags - `--viz` list (repeatable or comma-separated) - `--style` palette (classic, magma, inferno, viridis, gray) - `--width` / `--height` output size - `--window` / `--hop` FFT settings - `--min-freq` / `--max-freq` frequency range - `--start` / `--duration` time slice - `--format` jpg|png Notes - WAV/MP3 decode native; other formats use ffmpeg if available. - Multiple `--viz` renders a grid.