
Voicebox Voice Synthesis
Add self-hosted voice cloning and text-to-speech to an app or workflow without sending audio to a cloud TTS vendor.
Overview
Voicebox Voice Synthesis is an agent skill for the Build phase that helps solo builders run Voicebox locally and integrate open-source voice cloning and TTS via its REST API.
Install
npx skills add https://github.com/aradotso/trending-skills --skill voicebox-voice-synthesisWhat is this skill?
- Local-first studio with 5 TTS engines and 23 languages—MLX/Metal on Apple Silicon, CUDA on Windows/Linux, CPU fallback
- Self-hosted ElevenLabs-style workflow: clone voices, generate speech, apply post-processing effects
- REST API on localhost:17493 for app integration plus desktop UI (Tauri + React + FastAPI)
- Multi-track Stories editor for multi-voice narrative content
- Pre-built binaries for macOS/Windows and Docker compose for quick deployment
Adoption & trust: 1.3k installs on skills.sh; 31 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need branded or cloned voices and reliable TTS in your app but want to avoid cloud-only vendors, per-character billing, and sending raw audio off-device.
Who is it for?
Indie builders shipping agents, SaaS, or tools that need on-device or self-hosted speech with API-level control during integration work.
Skip if: Teams that only need a one-off cloud TTS API key with no local install, or products that cannot ship desktop binaries, Docker, or Python/Rust build prerequisites.
When should I use this skill?
Clone a voice with Voicebox, generate speech locally, set up Voicebox synthesis, use the Voicebox API, add TTS to an app, configure engines, apply voice effects, or use the Stories multi-voice editor.
What do I get? / Deliverables
You get Voicebox running locally with chosen engines and languages, callable from your code on localhost:17493, ready for cloned voices, effects, or multi-voice Stories output.
- Running Voicebox instance with API reachable on localhost:17493
- Configured engines/languages and optional voice profiles for synthesis
- Integration pattern for your app to request TTS or cloned-voice audio
Recommended Skills
Journey fit
Canonical shelf is Build because the skill centers on installing Voicebox, wiring its localhost REST API, and embedding synthesis into a product you are building. Integrations fits local TTS engines, API calls on port 17493, and hooking speech output into your stack alongside other services.
How it compares
Use for a self-hosted TTS studio and API—not as a lightweight markdown-only prompt skill or an MCP directory entry.
Common Questions / FAQ
Who is voicebox-voice-synthesis for?
Solo and indie developers using Claude Code, Cursor, or Codex who are adding local voice cloning and speech synthesis to apps or agent workflows with Voicebox.
When should I use voicebox-voice-synthesis?
During Build integrations when you clone a voice, generate speech locally, configure engines, call the Voicebox API from your app, or set up multi-voice Stories—before you rely on TTS in production features.
Is voicebox-voice-synthesis safe to install?
Voicebox runs locally and pulls model/runtime dependencies like any heavy desktop stack; review the Security Audits panel on this Prism page and vet downloads from voicebox.sh or the official GitHub repo before production use.
SKILL.md
READMESKILL.md - Voicebox Voice Synthesis
# Voicebox Voice Synthesis Studio > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Voicebox is a local-first, open-source voice cloning and TTS studio — a self-hosted alternative to ElevenLabs. It runs entirely on your machine (macOS MLX/Metal, Windows/Linux CUDA, CPU fallback), exposes a REST API on `localhost:17493`, and ships with 5 TTS engines, 23 languages, post-processing effects, and a multi-track Stories editor. --- ## Installation ### Pre-built Binaries (Recommended) | Platform | Link | |---|---| | macOS Apple Silicon | https://voicebox.sh/download/mac-arm | | macOS Intel | https://voicebox.sh/download/mac-intel | | Windows | https://voicebox.sh/download/windows | | Docker | `docker compose up` | Linux requires building from source: https://voicebox.sh/linux-install ### Build from Source **Prerequisites:** [Bun](https://bun.sh), [Rust](https://rustup.rs), [Python 3.11+](https://python.org), Tauri prerequisites ```bash git clone https://github.com/jamiepine/voicebox.git cd voicebox # Install just task runner brew install just # macOS cargo install just # any platform # Set up Python venv + all dependencies just setup # Start backend + desktop app in dev mode just dev ``` ```bash # List all available commands just --list ``` --- ## Architecture | Layer | Technology | |---|---| | Desktop App | Tauri (Rust) | | Frontend | React + TypeScript + Tailwind CSS | | State | Zustand + React Query | | Backend | FastAPI (Python) on port 17493 | | TTS Engines | Qwen3-TTS, LuxTTS, Chatterbox, Chatterbox Turbo, TADA | | Effects | Pedalboard (Spotify) | | Transcription | Whisper / Whisper Turbo | | Inference | MLX (Apple Silicon) / PyTorch (CUDA/ROCm/XPU/CPU) | | Database | SQLite | The Python FastAPI backend handles all ML inference. The Tauri Rust shell wraps the frontend and manages the backend process lifecycle. The API is accessible directly at `http://localhost:17493` even when using the desktop app. --- ## REST API Reference Base URL: `http://localhost:17493` Interactive docs: `http://localhost:17493/docs` ### Generate Speech ```bash # Basic generation curl -X POST http://localhost:17493/generate \ -H "Content-Type: application/json" \ -d '{ "text": "Hello world, this is a voice clone.", "profile_id": "abc123", "language": "en" }' # With engine selection curl -X POST http://localhost:17493/generate \ -H "Content-Type: application/json" \ -d '{ "text": "Speak slowly and with gravitas.", "profile_id": "abc123", "language": "en", "engine": "qwen3-tts" }' # With paralinguistic tags (Chatterbox Turbo only) curl -X POST http://localhost:17493/generate \ -H "Content-Type: application/json" \ -d '{ "text": "That is absolutely hilarious! [laugh] I cannot believe it.", "profile_id": "abc123", "engine": "chatterbox-turbo", "language": "en" }' ``` ### Voice Profiles ```bash # List all profiles curl http://localhost:17493/profiles # Create a new profile curl -X POST http://localhost:17493/profiles \ -H "Content-Type: application/json" \ -d '{ "name": "Narrator", "language": "en", "description": "Deep narrative voice" }' # Upload audio sample to a profile curl -X POST http://localhost:17493/profiles/{profile_id}/samples \ -F "file=@/path/to/voice-sample.wav" # Export a profile curl http://localhost:17493/profiles/{profile_id}/export \ --output narrator-profile.zip # Import a profile curl -X POST http://localhost: