Gemini Live Api Dev

Name: Gemini Live Api Dev
Author: google-gemini

google-gemini/gemini-skills

Wire low-latency voice, video, and text streaming into your product with Gemini over WebSockets and the official Python or JS SDKs.

Overview

Gemini Live API Dev is an agent skill for the Build phase that guides real-time WebSocket streaming with Gemini for audio, video, text, tools, and secure client auth.

Install

npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-dev

What is this skill?

WebSocket-only Live API patterns for real-time audio, video, and text in one session
Voice Activity Detection, native audio, configurable thinkingLevel, and synchronous function calling
Session management: context compression, resumption, GoAway signals, and Google Search grounding
Ephemeral tokens for secure client-side auth without exposing long-lived API keys
SDK coverage for google-genai (Python) and @google/genai (JavaScript/TypeScript)
Live API transport is WebSocket-only per skill guidance
Two official SDK families documented: google-genai (Python) and @google/genai (JS/TS)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.7k installs on skills.sh; 3.6k GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You need human-paced voice or video interaction with Gemini but only know REST chat patterns, not Live API sessions, VAD, or client-side token safety.

Who is it for?

Indie builders adding live voice or video agents to a web or mobile app using Google's official GenAI SDKs.

Skip if: Batch document Q&A, cron jobs, or deployments that only need the standard generateContent HTTP API.

When should I use this skill?

Building real-time, bidirectional streaming applications with the Gemini Live API (audio/video/text, VAD, tools, session management).

What do I get? / Deliverables

You leave with a coherent Live API integration plan—streaming modalities, session lifecycle, tools, grounding, and SDK-shaped code paths ready to implement and then harden in Ship.

Live API session architecture (modalities, auth, tool wiring)
SDK-aligned implementation notes for streaming send/receive loops

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildIntegrations & version control

Live API work is integration-heavy backend and agent plumbing during the Build phase, before you harden for Ship. Bidirectional WebSocket sessions, ephemeral client tokens, and tool calling belong on the integrations shelf—not generic frontend polish.

Also useful

BuildAgent skills & templates

Also useful

ShipSecurity

How it compares

Integration skill for WebSocket Live sessions—not a generic LLM prompt pack or an MCP server catalog entry.

Common Questions / FAQ

Who is gemini-live-api-dev for?

Solo and indie developers building real-time multimodal products with Gemini who already ship with Python or TypeScript and need correct WebSocket session design.

When should I use gemini-live-api-dev?

During Build when you integrate bidirectional audio/video chat, client ephemeral tokens, live function calling, or session resume/compression—before you load-test and secure the path in Ship.

Is gemini-live-api-dev safe to install?

Treat it as documentation-heavy guidance; review the Security Audits panel on this Prism page and never paste production API keys into agent chats when experimenting with client tokens.

SKILL.md

READMESKILL.md - Gemini Live Api Dev

# Gemini Live API Development Skill

## Overview

The Live API enables **low-latency, real-time voice and video interactions** with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.

Key capabilities:
- **Bidirectional audio streaming** — real-time mic-to-speaker conversations
- **Video streaming** — send camera/screen frames alongside audio
- **Text input/output** — send and receive text within a live session
- **Audio transcriptions** — get text transcripts of both input and output audio
- **Voice Activity Detection (VAD)** — automatic interruption handling
- **Native audio** — thinking (with configurable `thinkingLevel`)
- **Function calling** — synchronous tool use
- **Google Search grounding** — ground responses in real-time search results
- **Session management** — context compression, session resumption, GoAway signals
- **Ephemeral tokens** — secure client-side authentication

> [!NOTE]
> The Live API currently **only supports WebSockets**. For WebRTC support or simplified integration, use a [partner integration](#partner-integrations).

## Models

- `gemini-3.1-flash-live-preview` — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via `thinkingLevel`). 128k context window. **This is the recommended model for all Live API use cases.**

> [!WARNING]
> The following Live API models are **deprecated** and will be shut down. Migrate to `gemini-3.1-flash-live-preview`.
> - `gemini-2.5-flash-native-audio-preview-12-2025` — Migrate to `gemini-3.1-flash-live-preview`.
> - `gemini-live-2.5-flash-preview` — Released June 17, 2025. Shutdown: December 9, 2025.
> - `gemini-2.0-flash-live-001` — Released April 9, 2025. Shutdown: December 9, 2025.

## SDKs

- **Python**: `google-genai` — `pip install google-genai`
- **JavaScript/TypeScript**: `@google/genai` — `npm install @google/genai`

> [!WARNING]
> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Use the new SDKs above.

## Partner Integrations

To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over **WebRTC** or **WebSockets**:

- [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/) — Use the Gemini Live API with LiveKit Agents.
- [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live) — Create a real-time AI chatbot using Gemini Live and Pipecat.
- [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration) — Create live video and audio streaming applications with Fishjam.
- [Vision Agents by Stream](https://visionagents.ai/integrations/gemini) — Build real-time voice and video AI applications with Vision Agents.
- [Voximplant](https://voximplant.com/products/gemini-client) — Connect inbound and outbound calls to Live API with Voximplant.
- [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev) — Get started with the Gemini Live API using Firebase AI Logic.

## Audio Formats

- **Input**: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type: `audio/pcm;rate=16000`
- **Output**: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.

> [!IMPORTANT]
> Use `send_realtime_input` / `sendRealtimeInput` for all real-time user input (audio, video, **and text**). `send_client_content` / `sendClientContent` is **only** supported for seeding initial context history (requires setting `initial_history_in_client_c

What is this skill?

WebSocket-only Live API patterns for real-time audio, video, and text in one session

Voice Activity Detection, native audio, configurable thinkingLevel, and synchronous function calling

Session management: context compression, resumption, GoAway signals, and Google Search grounding

Ephemeral tokens for secure client-side auth without exposing long-lived API keys

SDK coverage for google-genai (Python) and @google/genai (JavaScript/TypeScript)

Live API transport is WebSocket-only per skill guidance

Two official SDK families documented: google-genai (Python) and @google/genai (JS/TS)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 3.7k installs on skills.sh; 3.6k GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

You leave with a coherent Live API integration plan—streaming modalities, session lifecycle, tools, grounding, and SDK-shaped code paths ready to implement and then harden in Ship.

Live API session architecture (modalities, auth, tool wiring)

SDK-aligned implementation notes for streaming send/receive loops

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

BuildAgent skills & templates

Also useful

ShipSecurity

SKILL.md

READMESKILL.md - Gemini Live Api Dev

# Gemini Live API Development Skill

## Overview

The Live API enables **low-latency, real-time voice and video interactions** with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.

Key capabilities:
- **Bidirectional audio streaming** — real-time mic-to-speaker conversations
- **Video streaming** — send camera/screen frames alongside audio
- **Text input/output** — send and receive text within a live session
- **Audio transcriptions** — get text transcripts of both input and output audio
- **Voice Activity Detection (VAD)** — automatic interruption handling
- **Native audio** — thinking (with configurable `thinkingLevel`)
- **Function calling** — synchronous tool use
- **Google Search grounding** — ground responses in real-time search results
- **Session management** — context compression, session resumption, GoAway signals
- **Ephemeral tokens** — secure client-side authentication

> [!NOTE]
> The Live API currently **only supports WebSockets**. For WebRTC support or simplified integration, use a [partner integration](#partner-integrations).

## Models

- `gemini-3.1-flash-live-preview` — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via `thinkingLevel`). 128k context window. **This is the recommended model for all Live API use cases.**

> [!WARNING]
> The following Live API models are **deprecated** and will be shut down. Migrate to `gemini-3.1-flash-live-preview`.
> - `gemini-2.5-flash-native-audio-preview-12-2025` — Migrate to `gemini-3.1-flash-live-preview`.
> - `gemini-live-2.5-flash-preview` — Released June 17, 2025. Shutdown: December 9, 2025.
> - `gemini-2.0-flash-live-001` — Released April 9, 2025. Shutdown: December 9, 2025.

## SDKs

- **Python**: `google-genai` — `pip install google-genai`
- **JavaScript/TypeScript**: `@google/genai` — `npm install @google/genai`

> [!WARNING]
> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Use the new SDKs above.

## Partner Integrations

To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over **WebRTC** or **WebSockets**:

- [LiveKit](https://docs.livekit.io/agents/models/realtime/plugins/gemini/) — Use the Gemini Live API with LiveKit Agents.
- [Pipecat by Daily](https://docs.pipecat.ai/guides/features/gemini-live) — Create a real-time AI chatbot using Gemini Live and Pipecat.
- [Fishjam by Software Mansion](https://docs.fishjam.io/tutorials/gemini-live-integration) — Create live video and audio streaming applications with Fishjam.
- [Vision Agents by Stream](https://visionagents.ai/integrations/gemini) — Build real-time voice and video AI applications with Vision Agents.
- [Voximplant](https://voximplant.com/products/gemini-client) — Connect inbound and outbound calls to Live API with Voximplant.
- [Firebase AI SDK](https://firebase.google.com/docs/ai-logic/live-api?api=dev) — Get started with the Gemini Live API using Firebase AI Logic.

## Audio Formats

- **Input**: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type: `audio/pcm;rate=16000`
- **Output**: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.

> [!IMPORTANT]
> Use `send_realtime_input` / `sendRealtimeInput` for all real-time user input (audio, video, **and text**). `send_client_content` / `sendClientContent` is **only** supported for seeding initial context history (requires setting `initial_history_in_client_c

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gemini-live-api-dev for?

When should I use gemini-live-api-dev?

Is gemini-live-api-dev safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gemini-live-api-dev for?

When should I use gemini-live-api-dev?

Is gemini-live-api-dev safe to install?

SKILL.md