
Gemini Live Api Dev
Wire low-latency voice, video, and text streaming into your product with Gemini over WebSockets and the official Python or JS SDKs.
Overview
Gemini Live API Dev is an agent skill for the Build phase that guides real-time WebSocket streaming with Gemini for audio, video, text, tools, and secure client auth.
Install
npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-devWhat is this skill?
- WebSocket-only Live API patterns for real-time audio, video, and text in one session
- Voice Activity Detection, native audio, configurable thinkingLevel, and synchronous function calling
- Session management: context compression, resumption, GoAway signals, and Google Search grounding
- Ephemeral tokens for secure client-side auth without exposing long-lived API keys
- SDK coverage for google-genai (Python) and @google/genai (JavaScript/TypeScript)
- Live API transport is WebSocket-only per skill guidance
- Two official SDK families documented: google-genai (Python) and @google/genai (JS/TS)
Adoption & trust: 3.7k installs on skills.sh; 3.6k GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You need human-paced voice or video interaction with Gemini but only know REST chat patterns, not Live API sessions, VAD, or client-side token safety.
Who is it for?
Indie builders adding live voice or video agents to a web or mobile app using Google's official GenAI SDKs.
Skip if: Batch document Q&A, cron jobs, or deployments that only need the standard generateContent HTTP API.
When should I use this skill?
Building real-time, bidirectional streaming applications with the Gemini Live API (audio/video/text, VAD, tools, session management).
What do I get? / Deliverables
You leave with a coherent Live API integration plan—streaming modalities, session lifecycle, tools, grounding, and SDK-shaped code paths ready to implement and then harden in Ship.
- Live API session architecture (modalities, auth, tool wiring)
- SDK-aligned implementation notes for streaming send/receive loops
Recommended Skills
Journey fit
Live API work is integration-heavy backend and agent plumbing during the Build phase, before you harden for Ship. Bidirectional WebSocket sessions, ephemeral client tokens, and tool calling belong on the integrations shelf—not generic frontend polish.
How it compares
Integration skill for WebSocket Live sessions—not a generic LLM prompt pack or an MCP server catalog entry.
Common Questions / FAQ
Who is gemini-live-api-dev for?
Solo and indie developers building real-time multimodal products with Gemini who already ship with Python or TypeScript and need correct WebSocket session design.
When should I use gemini-live-api-dev?
During Build when you integrate bidirectional audio/video chat, client ephemeral tokens, live function calling, or session resume/compression—before you load-test and secure the path in Ship.
Is gemini-live-api-dev safe to install?
Treat it as documentation-heavy guidance; review the Security Audits panel on this Prism page and never paste production API keys into agent chats when experimenting with client tokens.
SKILL.md
READMESKILL.md - Gemini Live Api Dev
# Gemini Live API Development Skill ## Overview The Live API enables **low-latency, real-time voice and video interactions** with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses. Key capabilities: - **Bidirectional audio streaming** — real-time mic-to-speaker conversations - **Video streaming** — send camera/screen frames alongside audio - **Text input/output** — send and receive text within a live session - **Audio transcriptions** — get text transcripts of both input and output audio - **Voice Activity Detection (VAD)** — automatic interruption handling - **Native audio** — thinking (with configurable `thinkingLevel`) - **Function calling** — synchronous tool use - **Google Search grounding** — ground responses in real-time search results - **Session management** — context compression, session resumption, GoAway signals - **Ephemeral tokens** — secure client-side authentication > [!NOTE] > The Live API currently **only supports WebSockets**. For WebRTC support or simplified integration, use a [partner integration](#partner-integrations). ## Models - `gemini-3.1-flash-live-preview` — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via `thinkingLevel`). 128k context window. **This is the recommended model for all Live API use cases.** > [!WARNING] > The following Live API models are **deprecated** and will be shut down. Migrate to `gemini-3.1-flash-live-preview`. > - `gemini-2.5-flash-native-audio-preview-12-2025` — Migrate to `gemini-3.1-flash-live-preview`. > - `gemini-live-2.5-flash-preview` — Released June 17, 2025. Shutdown: December 9, 2025. > - `gemini-2.0-flash-live-001` — Released April 9, 2025. Shutdown: December 9, 2025. ## SDKs - **Python**: `google-genai` — `pip install google-genai` - **JavaScript/TypeScript**: `@google/genai` — `npm install @google/genai` > [!WARNING] > Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Use the new SDKs above. ## Partner Integrations To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over **WebRTC** or **WebSockets**: - [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/) — Use the Gemini Live API with LiveKit Agents. - [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live) — Create a real-time AI chatbot using Gemini Live and Pipecat. - [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration) — Create live video and audio streaming applications with Fishjam. - [Vision Agents by Stream](https://visionagents.ai/integrations/gemini) — Build real-time voice and video AI applications with Vision Agents. - [Voximplant](https://voximplant.com/products/gemini-client) — Connect inbound and outbound calls to Live API with Voximplant. - [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev) — Get started with the Gemini Live API using Firebase AI Logic. ## Audio Formats - **Input**: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type: `audio/pcm;rate=16000` - **Output**: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate. > [!IMPORTANT] > Use `send_realtime_input` / `sendRealtimeInput` for all real-time user input (audio, video, **and text**). `send_client_content` / `sendClientContent` is **only** supported for seeding initial context history (requires setting `initial_history_in_client_c