
Ffvoice
Transcribe meeting and voice memos locally with speaker diarization so your coding agent can search and summarize audio without sending files to a cloud STT API.
Overview
ffvoice is a MCP server for the Build phase that provides offline speech-to-text and speaker diarization tools to coding agents via a local stdio PyPI package.
What is this MCP server?
- On-device speech-to-text with speaker diarization—no cloud upload required
- PyPI package ffvoice with MCP extra installed via uvx --from ffvoice[mcp]
- stdio MCP transport at server version 0.8.3
- Privacy-friendly transcription for founders handling customer calls or podcast raw audio
- Entry point ffvoice-mcp when the MCP extra is installed
- MCP server version 0.8.3
- Install path: uvx --from ffvoice[mcp]
- Transport: stdio (PyPI registry)
Community signal: 3 GitHub stars.
What problem does it solve?
Builders who want transcripts inside their agent loop either leak audio to cloud STT or manually copy text from desktop apps.
Who is it for?
Privacy-minded solo developers building agent features on voice memos, interviews, or support recordings without a cloud transcription bill.
Skip if: Teams that need managed real-time phone streaming, guaranteed enterprise accuracy SLAs, or zero local compute footprint.
What do I get? / Deliverables
After uvx installs ffvoice[mcp] and you register stdio MCP, your agent transcribes and diarizes audio locally for downstream coding and summarization tasks.
- Local MCP tools for transcription and speaker-separated text
- Agent-ready transcripts without cloud STT API keys
- Reproducible stdio config from ffvoice 0.8.3 manifest
Recommended MCP Servers
Journey fit
Audio pipelines belong in Build when you integrate agent tooling around notes, support clips, or voice-driven features. Agent-tooling is the shelf for MCP servers that extend what Claude/Cursor can do on media files during implementation.
How it compares
Local AI media MCP, not a hosted Zoom or Otter integration skill.
Common Questions / FAQ
Who is ffvoice for?
Indie builders and agent developers who need on-device transcription and speaker labels inside Claude Code, Cursor, or similar MCP clients.
When should I use ffvoice?
Use it during Build when you integrate voice ingestion, meeting notes, or diarized transcripts into an app or agent workflow.
How do I add ffvoice to my agent?
Configure stdio MCP with runtime hint uvx and package arguments --from ffvoice[mcp] per the published server.json, then restart your MCP client.