
Speak Tts
Turn written material into spoken audio—audiobooks, podcasts, or cloned-voice reads—for demos, content, and accessibility without hand-recording every take.
Overview
Speak TTS is an agent skill most often used in Grow (also Validate, Launch) that converts text sources into spoken audio workflows such as audiobooks, podcasts, and voice-guided reads.
Install
npx skills add https://github.com/emzod/speak --skill speak-ttsWhat is this skill?
- Documented focus-group iterations from v0 through v5 (PDF audiobook, podcast debate, news read, voice clone)
- Targets long-form conversion (e.g., 50–100-page PDF audiobook workflows)
- Multi-voice podcast and presentation-read scenarios in test matrices
- Evaluated across haiku, sonnet, and opus-class models for quality vs cost
- Skill packaging optimized via structured test runs rather than one-off prompts
- Documented focus-group matrix from round 1 through round 5 with v0–v5 skill versions
- Regression tasks include 50-page and 100-page PDF audiobook scenarios
- Test runs recorded across haiku, sonnet, opus, and qwen model tiers
Adoption & trust: 725 installs on skills.sh; 7 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have long-form text or scripts but no fast path to polished spoken audio for users, leads, or async updates.
Who is it for?
Solo builders repurposing docs, debates, or presentations into audio for content marketing, courses, or accessibility experiments.
Skip if: Teams that need broadcast-licensed voice talent contracts, real-time low-latency voice APIs only, or video-first pipelines with no audio deliverable.
When should I use this skill?
You need to convert text (PDF, article, script, or presentation) into spoken audio, multi-voice podcast segments, or voice-clone style reads via an agent-driven TTS workflow.
What do I get? / Deliverables
The agent follows a TTS-oriented workflow to produce listenable audio from PDFs, articles, or presentations, with repo evidence of iterative quality tuning across model tiers.
- Audiobook or narrated audio from long-form text
- Multi-voice podcast segments
- Voice-guided presentation or article reads
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Grow/content is the canonical shelf because the skill’s evidenced outcomes (PDF audiobooks, multi-voice podcasts, voice-cloned reads) are distribution and audience-facing audio products. Content subphase covers repurposing docs and articles into listenable formats solo builders use for marketing, courses, and async updates.
Where it fits
Generate a short spoken walkthrough of your MVP story for user interviews.
Republish a long guide as an audiobook for subscribers who prefer listen mode.
Produce debate-style podcast clips to promote a launch angle on social channels.
Narrate internal README or training docs for async onboarding.
How it compares
A procedural TTS content skill for agents—not a hosted podcast hosting platform or music-generation tool.
Common Questions / FAQ
Who is speak-tts for?
Indie builders and agent users who want PDFs, articles, or scripts turned into audiobook-style or multi-voice audio without manual recording marathons.
When should I use speak-tts?
In Grow/content when publishing audio derivatives, in Validate/prototype when testing a spoken product narrative, and in Launch/distribution when you need shareable voice clips—after you confirm which TTS backends the skill version wires to.
Is speak-tts safe to install?
Audio skills may invoke external TTS APIs or local tools depending on version; review the Security Audits panel on this Prism page and audit any voice-clone or third-party speech endpoints before processing user data.
SKILL.md
READMESKILL.md - Speak Tts
# Focus Group Test Runs Raw output from each Agent Focus Group test run during the SKILL.md optimization process. ## Run Index | File | Round | Skill Version | Task | Models | |------|-------|---------------|------|--------| | `round1a-v0-pdf-audiobook.md` | 1 | v0 (original) | Convert 50-page PDF to audiobook | haiku, sonnet, opus, qwen | | `round1b-v0-podcast.md` | 1 | v0 (original) | Create 3-voice podcast debate | haiku, sonnet, qwen | | `round2a-v1-pdf-audiobook.md` | 2 | v1 | Convert 50-page PDF to audiobook | haiku, sonnet | | `round2b-v1-podcast.md` | 2 | v1 | Create 3-voice podcast debate | haiku, sonnet | | `round3-v2-news-article.md` | 3 | v2 | Read news article and save | haiku, sonnet, opus | | `round4-v3-voice-clone.md` | 4 | v3 | Clone voice and read presentation | haiku, sonnet, opus | | `round5a-v4-voice-clone.md` | 5 | v4 | Clone voice and read presentation | haiku, sonnet, opus | | `round5b-v4-pdf-audiobook.md` | 5 | v4 | Convert 100-page PDF to audiobook | haiku, sonnet | | `round5c-v5-combined.md` | 5 | v5 (final) | Clone voice + PDF audiobook | sonnet, opus | ## Models Used - `anthropic/claude-haiku-4.5` — Fast, cost-effective - `anthropic/claude-sonnet-4.5` — Balanced quality/cost - `anthropic/claude-opus-4.5` — Highest quality - `qwen/qwen3-coder:free` — Free tier (used in early rounds) ## How to Read These Files Each file contains: 1. **Run metadata** — Date, skill file, task, status 2. **Per-model responses** including: - Understanding — What the model understood - Approach — How it would complete the task - Confusions — What was unclear in the docs - Potential Failures — What could go wrong - Suggested Improvements — Specific recommendations ## Key Insights by Round ### Round 1 (v0 → v1) - PDF support completely missing - Voice path confusion (relative vs absolute) - --out vs --output inconsistency - Prerequisites buried at bottom ### Round 2 (v1 → v2) - Batch + auto-chunk interaction unclear - Voice availability/default not explained - Emotion tag behavior ambiguous ### Round 3 (v2 → v3) - Clipboard/URL input not documented - Default output behavior unclear - Output mode decision tree needed ### Round 4 (v3 → v4) - Voice cloning workflow incomplete - Audio format conversion missing - Sox command unexplained ### Round 5 (v4 → v5) - Minor polish items only - Directory creation order clarified - Complete workflows validated --- name: speak-tts description: Local text-to-speech generation using Chatterbox TTS on Apple Silicon. Use this when users request converting text to audio, reading articles/documents aloud, generating speech from clipboard content, voice cloning, or creating audiobook-style narration. Runs entirely on-device via MLX for private TTS. Supports auto-chunking for long documents, batch processing, and resume capability. --- # speak - Text to Speech for Agents Convert text to natural speech audio using Chatterbox TTS on Apple Silicon. ## Prerequisites | Requirement | Check | Install | |-------------|-------|---------| | Apple Silicon Mac | `uname -m` → arm64 | Intel not supported | | macOS 12.0+ | `sw_vers` | - | | sox | `which sox` | `brew install sox` | | ffmpeg | `which ffmpeg` | `brew install ffmpeg` | | poppler (PDF) | `which pdftotext` | `brew install poppler` | ## Input Sources | Source | Example | |--------|---------| | Text file | `speak article.txt` | | Markdown | `speak doc.md` | | Direct string | `speak "Hello"` | | Clipboard | `pbpaste \| speak` | | Stdin | `cat file.txt \| speak` | ### Web Articles ```bash lynx -dump -nolist "https://example.com/article" | speak --output article.wav ``` ### Converting Formats | Format | Convert Command | |--------|-----------------| | PDF | `pdftotext doc.pdf doc.txt` | | DOCX | `textutil -convert txt doc.docx` | | HTML | `pandoc -f html -t plain doc.html > doc.txt` | ## Output Modes | Goal | Command | |------|---------| | Save for later | `speak text.txt --output file.wav` | | Listen now (streami