Speak Tts

Grow/content is the canonical shelf because the skill’s evidenced outcomes (PDF audiobooks, multi-voice podcasts, voice-cloned reads) are distribution and audience-facing audio products. Content subphase covers repurposing docs and articles into listenable formats solo builders use for marketing, courses, and async updates.

Also useful

Also useful

Where it fits

Example use

Generate a short spoken walkthrough of your MVP story for user interviews.

Example use

Republish a long guide as an audiobook for subscribers who prefer listen mode.

Example use

Produce debate-style podcast clips to promote a launch angle on social channels.

Example use

BuildDocs & content

Narrate internal README or training docs for async onboarding.

How it compares

A procedural TTS content skill for agents—not a hosted podcast hosting platform or music-generation tool.

Common Questions / FAQ

Who is speak-tts for?

Indie builders and agent users who want PDFs, articles, or scripts turned into audiobook-style or multi-voice audio without manual recording marathons.

When should I use speak-tts?

In Grow/content when publishing audio derivatives, in Validate/prototype when testing a spoken product narrative, and in Launch/distribution when you need shareable voice clips—after you confirm which TTS backends the skill version wires to.

Is speak-tts safe to install?

Audio skills may invoke external TTS APIs or local tools depending on version; review the Security Audits panel on this Prism page and audit any voice-clone or third-party speech endpoints before processing user data.

SKILL.md

READMESKILL.md - Speak Tts

# Focus Group Test Runs

Raw output from each Agent Focus Group test run during the SKILL.md optimization process.

## Run Index

| File | Round | Skill Version | Task | Models |
|------|-------|---------------|------|--------|
| `round1a-v0-pdf-audiobook.md` | 1 | v0 (original) | Convert 50-page PDF to audiobook | haiku, sonnet, opus, qwen |
| `round1b-v0-podcast.md` | 1 | v0 (original) | Create 3-voice podcast debate | haiku, sonnet, qwen |
| `round2a-v1-pdf-audiobook.md` | 2 | v1 | Convert 50-page PDF to audiobook | haiku, sonnet |
| `round2b-v1-podcast.md` | 2 | v1 | Create 3-voice podcast debate | haiku, sonnet |
| `round3-v2-news-article.md` | 3 | v2 | Read news article and save | haiku, sonnet, opus |
| `round4-v3-voice-clone.md` | 4 | v3 | Clone voice and read presentation | haiku, sonnet, opus |
| `round5a-v4-voice-clone.md` | 5 | v4 | Clone voice and read presentation | haiku, sonnet, opus |
| `round5b-v4-pdf-audiobook.md` | 5 | v4 | Convert 100-page PDF to audiobook | haiku, sonnet |
| `round5c-v5-combined.md` | 5 | v5 (final) | Clone voice + PDF audiobook | sonnet, opus |

## Models Used

- `anthropic/claude-haiku-4.5` — Fast, cost-effective
- `anthropic/claude-sonnet-4.5` — Balanced quality/cost
- `anthropic/claude-opus-4.5` — Highest quality
- `qwen/qwen3-coder:free` — Free tier (used in early rounds)

## How to Read These Files

Each file contains:
1. **Run metadata** — Date, skill file, task, status
2. **Per-model responses** including:
   - Understanding — What the model understood
   - Approach — How it would complete the task
   - Confusions — What was unclear in the docs
   - Potential Failures — What could go wrong
   - Suggested Improvements — Specific recommendations

## Key Insights by Round

### Round 1 (v0 → v1)
- PDF support completely missing
- Voice path confusion (relative vs absolute)
- --out vs --output inconsistency
- Prerequisites buried at bottom

### Round 2 (v1 → v2)
- Batch + auto-chunk interaction unclear
- Voice availability/default not explained
- Emotion tag behavior ambiguous

### Round 3 (v2 → v3)  
- Clipboard/URL input not documented
- Default output behavior unclear
- Output mode decision tree needed

### Round 4 (v3 → v4)
- Voice cloning workflow incomplete
- Audio format conversion missing
- Sox command unexplained

### Round 5 (v4 → v5)
- Minor polish items only
- Directory creation order clarified
- Complete workflows validated


---
name: speak-tts
description: Local text-to-speech generation using Chatterbox TTS on Apple Silicon. Use this when users request converting text to audio, reading articles/documents aloud, generating speech from clipboard content, voice cloning, or creating audiobook-style narration. Runs entirely on-device via MLX for private TTS. Supports auto-chunking for long documents, batch processing, and resume capability.
---

# speak - Text to Speech for Agents

Convert text to natural speech audio using Chatterbox TTS on Apple Silicon.

## Prerequisites

| Requirement | Check | Install |
|-------------|-------|---------|
| Apple Silicon Mac | `uname -m` → arm64 | Intel not supported |
| macOS 12.0+ | `sw_vers` | - |
| sox | `which sox` | `brew install sox` |
| ffmpeg | `which ffmpeg` | `brew install ffmpeg` |
| poppler (PDF) | `which pdftotext` | `brew install poppler` |

## Input Sources

| Source | Example |
|--------|---------|
| Text file | `speak article.txt` |
| Markdown | `speak doc.md` |
| Direct string | `speak "Hello"` |
| Clipboard | `pbpaste \| speak` |
| Stdin | `cat file.txt \| speak` |

### Web Articles
```bash
lynx -dump -nolist "https://example.com/article" | speak --output article.wav
```

### Converting Formats

| Format | Convert Command |
|--------|-----------------|
| PDF | `pdftotext doc.pdf doc.txt` |
| DOCX | `textutil -convert txt doc.docx` |
| HTML | `pandoc -f html -t plain doc.html > doc.txt` |

## Output Modes

| Goal | Command |
|------|---------|
| Save for later | `speak text.txt --output file.wav` |
| Listen now (streami

What is this skill?

Documented focus-group iterations from v0 through v5 (PDF audiobook, podcast debate, news read, voice clone)

Targets long-form conversion (e.g., 50–100-page PDF audiobook workflows)

Multi-voice podcast and presentation-read scenarios in test matrices

Evaluated across haiku, sonnet, and opus-class models for quality vs cost

Skill packaging optimized via structured test runs rather than one-off prompts

Documented focus-group matrix from round 1 through round 5 with v0–v5 skill versions

Regression tasks include 50-page and 100-page PDF audiobook scenarios

Test runs recorded across haiku, sonnet, opus, and qwen model tiers

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 725 installs on skills.sh; 7 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Generate a short spoken walkthrough of your MVP story for user interviews.

Example use

Republish a long guide as an audiobook for subscribers who prefer listen mode.

Example use