
Vox
Add native macOS voice input and spoken replies to your coding agent over MCP without building custom audio plumbing.
Overview
Vox is a MCP server for the Build phase that provides native macOS voice input and text-to-speech output for agent workflows over stdio.
What is this MCP server?
- Native Swift macOS binary distributed as MCPB v1.0.0 with stdio transport
- On-device speech-to-text via Apple SFSpeechRecognizer
- Text-to-speech via ElevenLabs when ELEVENLABS_API_KEY is set, otherwise macOS system voice
- Optional secret env var for ElevenLabs; no key required for basic TTS fallback
- Purpose-built MCP server for voice I/O, not a general LLM or browser tool
- Registry version 1.0.0
- Optional ELEVENLABS_API_KEY environment variable
Community signal: 1 GitHub stars.
What problem does it solve?
Talking to your coding agent still means typing and reading walls of text, which breaks flow when you are debugging, walking, or pair-programming alone.
Who is it for?
Solo Mac developers who want low-friction voice loops with Claude Code or Cursor and already accept macOS-only tooling.
Skip if: Windows or Linux builders, teams needing hosted multi-user voice, or anyone who only needs text chat without MCP wiring.
What do I get? / Deliverables
After you register Vox in your agent, you can dictate prompts and hear responses through macOS speech APIs with optional ElevenLabs quality.
- stdio MCP connection for agent-invoked listen and speak flows
- On-device transcription via SFSpeechRecognizer
- Spoken agent responses via ElevenLabs or macOS system voice
Recommended MCP Servers
Journey fit
Voice I/O sits in the build phase because it extends how you interact with agents while you ship product code, not how you market or operate production metrics. Agent-tooling is the right shelf: Vox is an MCP stdio server wired into Claude Code, Cursor, and similar clients—not a standalone app feature.
How it compares
Native voice I/O MCP server, not a general chat skill or cloud telephony integration.
Common Questions / FAQ
Who is io.github.boska/vox for?
It is for solo and indie builders on macOS who use MCP-enabled coding agents and want local speech recognition plus optional ElevenLabs TTS.
When should I use io.github.boska/vox?
Use it during active build sessions when hands-free capture or spoken agent replies help you stay in flow without leaving your dev environment.
How do I add io.github.boska/vox to my agent?
Add the catalog package (stdio, vox-mcp from the GitHub release) to your MCP client config and optionally set ELEVENLABS_API_KEY for premium TTS.