
Gemini Audio Mcp
Generate voiceovers, music beds, and sound effects from your agent using Google Gemini 2.5 and Lyria 3 without hand-rolling Audio API clients.
Overview
io.github.jxoesneon/gemini-audio-mcp is an MCP server for the Build phase that generates audio, music, and voice via Google Gemini 2.5 and Lyria 3 for AI assistants.
What is this MCP server?
- MCP tools for audio, music, and voice generation on Gemini 2.5 and Lyria 3
- Distributed as OCI image ghcr.io/jxoesneon/gemini-audio-mcp:0.1.0 with stdio transport
- Requires GEMINI_API_KEY from Google AI Studio
- Positioned as high-performance generation server for agent-driven creative pipelines
- Registry version 0.1.0—early release; expect API surface evolution
- MCP server version 0.1.0
- 1 required secret environment variable: GEMINI_API_KEY
- OCI package ghcr.io/jxoesneon/gemini-audio-mcp:0.1.0 with stdio transport
What problem does it solve?
You need custom audio for demos and product UI but bouncing between Google AI Studio, download folders, and your repo breaks agent-centric flow.
Who is it for?
Indie builders already on Google AI Studio who want Cursor or Claude Code to produce Lyria music and Gemini speech in one integration step.
Skip if: Teams forbidden from cloud generative audio, pro DAW mastering workflows, or products that only need speech-to-text without generation.
What do I get? / Deliverables
Your agent can request generated voice, music, and sound assets through MCP so files land ready for app integration or marketing drafts.
- Generated voice, music, or sound clips produced through MCP tool calls
- Agent-ready audio assets for prototypes, notifications, and marketing drafts
Recommended MCP Servers
Journey fit
Audio generation is wired while building demos, onboarding flows, and marketing assets—an integration task, not distribution analytics. The server fronts Gemini and Lyria APIs via MCP (OCI image + GEMINI_API_KEY), fitting agent-tooling for multimedia product surfaces.
How it compares
Gemini/Lyria generative audio MCP, not local whisper transcription or a stock-music marketplace.
Common Questions / FAQ
Who is io.github.jxoesneon/gemini-audio-mcp for?
Solo builders shipping agent-driven apps or content who want Gemini 2.5 and Lyria 3 audio generation inside MCP-compatible coding tools.
When should I use io.github.jxoesneon/gemini-audio-mcp?
Use it while building prototypes, app sound design, or launch creatives when cloud-generated voice and music speed iteration before final production assets.
How do I add io.github.jxoesneon/gemini-audio-mcp to my agent?
Run the OCI image ghcr.io/jxoesneon/gemini-audio-mcp:0.1.0 as a stdio MCP server, set GEMINI_API_KEY from Google AI Studio, and register the server in Claude Code, Cursor, or your MCP host.