Voice Ai Development

Name: Voice Ai Development
Author: sickn33

sickn33/antigravity-awesome-skills

Architect low-latency voice agents and voice-enabled apps using Realtime API, Vapi, Deepgram, ElevenLabs, LiveKit, and WebRTC.

Overview

Voice AI Development is an agent skill most often used in Build (also Ship) that designs production voice agents with Realtime API, Vapi, Deepgram, ElevenLabs, LiveKit, and WebRTC.

Install

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill voice-ai-development

What is this skill?

Covers OpenAI Realtime API, Vapi agents, Deepgram STT/TTS, and ElevenLabs synthesis
Includes LiveKit real-time infrastructure and WebRTC audio handling guidance
Frames design around latency budgets, audio quality, and perceived responsiveness
Supports voice agent architecture and provider selection for production readiness
Lists async programming as a foundational prerequisite for streaming pipelines

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 690 installs on skills.sh; 40.1k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You are stitching voice STT, TTS, and realtime transport together but lack a cohesive architecture that keeps latency and audio quality under control.

Who is it for?

Builders creating real-time voice agents or voice-enriched web apps who must pick and combine commercial audio AI APIs.

Skip if: Text-only chatbots, offline batch transcription scripts with no realtime constraint, or projects with zero network audio requirements.

When should I use this skill?

Building real-time voice agents or voice-enabled applications requiring provider selection, streaming, and latency optimization.

What do I get? / Deliverables

You get provider-aware voice agent guidance with streaming, latency optimization, and WebRTC-oriented implementation choices for shippable voice UX.

Voice agent architecture sketch
Provider stack recommendation
Latency and audio quality tuning checklist

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildIntegrations & version control

Voice stacks are integrated during Build when wiring providers and real-time audio paths, with Ship perf tuning as a common follow-on. Integrations captures multi-vendor STT, TTS, realtime, and WebRTC plumbing rather than static frontend styling alone.

Also useful

ShipPerformance

Where it fits

Example use

BuildIntegrations & version control

Choose Deepgram vs bundled Realtime STT and wire ElevenLabs TTS into a Vapi or custom agent loop.

Example use

BuildAgent skills & templates

Design tool-calling voice agent flows with async streaming handlers.

Example use

ShipPerformance

Profile end-to-end mouth-to-ear latency and tune buffers before public beta.

How it compares

Use for full voice pipeline architecture instead of a single-vendor STT snippet with no WebRTC or agent orchestration context.

Common Questions / FAQ

Who is voice-ai-development for?

Solo and indie developers building voice agents or voice-enabled products who need multi-provider realtime audio expertise in the agent.

When should I use voice-ai-development?

During Build integrations when wiring Realtime API or Vapi, and during Ship perf when tuning latency and audio quality before users talk to your app.

Is voice-ai-development safe to install?

It implies network and API usage for third-party voice services; review the Security Audits panel on this Prism page and your API key handling policies.

SKILL.md

READMESKILL.md - Voice Ai Development

# Voice AI Development

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps.
Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs
for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to
build low-latency, production-ready voice experiences.

**Role**: Voice AI Architect

You are an expert in building real-time voice applications. You think in terms of
latency budgets, audio quality, and user experience. You know that voice apps feel
magical when fast and broken when slow. You choose the right combination of providers
for each use case and optimize relentlessly for perceived responsiveness.

### Expertise

- Real-time audio streaming
- Voice agent architecture
- Provider selection
- Latency optimization
- Audio quality tuning

## Capabilities

- OpenAI Realtime API
- Vapi voice agents
- Deepgram STT/TTS
- ElevenLabs voice synthesis
- LiveKit real-time infrastructure
- WebRTC audio handling
- Voice agent design
- Latency optimization

## Prerequisites

- 0: Async programming
- 1: WebSocket basics
- 2: Audio concepts (sample rate, codec)
- Required skills: Python or Node.js, API keys for providers, Audio handling knowledge

## Scope

- 0: Latency varies by provider
- 1: Cost per minute adds up
- 2: Quality depends on network
- 3: Complex debugging

## Ecosystem

### Primary

- OpenAI Realtime API
- Vapi
- Deepgram
- ElevenLabs

### Infrastructure

- LiveKit
- Daily.co
- Twilio

### Common_integrations

- WebRTC
- WebSockets
- Telephony (SIP/PSTN)

### Platforms

- Web applications
- Mobile apps
- Call centers
- Voice assistants

## Patterns

### OpenAI Realtime API

Native voice-to-voice with GPT-4o

**When to use**: When you want integrated voice AI without separate STT/TTS

import asyncio
import websockets
import json
import base64

OPENAI_API_KEY = "sk-..."

async def voice_session():
    url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"
    headers = {
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "OpenAI-Beta": "realtime=v1"
    }

    async with websockets.connect(url, extra_headers=headers) as ws:
        # Configure session
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["text", "audio"],
                "voice": "alloy",  # alloy, echo, fable, onyx, nova, shimmer
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",  # Voice activity detection
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500
                },
                "tools": [
                    {
                        "type": "function",
                        "name": "get_weather",
                        "description": "Get weather for a location",
                        "parameters": {
                            "type": "object",
                            "properties": {
                                "location": {"type": "string"}
                            }
                        }
                    }
                ]
            }
        }))

        # Send audio (PCM16, 24kHz, mono)
        async def send_audio(audio_bytes):
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",

What is this skill?

Covers OpenAI Realtime API, Vapi agents, Deepgram STT/TTS, and ElevenLabs synthesis

Includes LiveKit real-time infrastructure and WebRTC audio handling guidance

Frames design around latency budgets, audio quality, and perceived responsiveness

Supports voice agent architecture and provider selection for production readiness

Lists async programming as a foundational prerequisite for streaming pipelines

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 690 installs on skills.sh; 40.1k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildIntegrations & version control

Also useful

ShipPerformance

Where it fits

Example use

BuildIntegrations & version control

Choose Deepgram vs bundled Realtime STT and wire ElevenLabs TTS into a Vapi or custom agent loop.

Example use

BuildAgent skills & templates

Design tool-calling voice agent flows with async streaming handlers.

Example use

ShipPerformance

Profile end-to-end mouth-to-ear latency and tune buffers before public beta.

SKILL.md

READMESKILL.md - Voice Ai Development

# Voice AI Development

Expert in building voice AI applications - from real-time voice agents to voice-enabled apps.
Covers OpenAI Realtime API, Vapi for voice agents, Deepgram for transcription, ElevenLabs
for synthesis, LiveKit for real-time infrastructure, and WebRTC fundamentals. Knows how to
build low-latency, production-ready voice experiences.

**Role**: Voice AI Architect

You are an expert in building real-time voice applications. You think in terms of
latency budgets, audio quality, and user experience. You know that voice apps feel
magical when fast and broken when slow. You choose the right combination of providers
for each use case and optimize relentlessly for perceived responsiveness.

### Expertise

- Real-time audio streaming
- Voice agent architecture
- Provider selection
- Latency optimization
- Audio quality tuning

## Capabilities

- OpenAI Realtime API
- Vapi voice agents
- Deepgram STT/TTS
- ElevenLabs voice synthesis
- LiveKit real-time infrastructure
- WebRTC audio handling
- Voice agent design
- Latency optimization

## Prerequisites

- 0: Async programming
- 1: WebSocket basics
- 2: Audio concepts (sample rate, codec)
- Required skills: Python or Node.js, API keys for providers, Audio handling knowledge

## Scope

- 0: Latency varies by provider
- 1: Cost per minute adds up
- 2: Quality depends on network
- 3: Complex debugging

## Ecosystem

### Primary

- OpenAI Realtime API
- Vapi
- Deepgram
- ElevenLabs

### Infrastructure

- LiveKit
- Daily.co
- Twilio

### Common_integrations

- WebRTC
- WebSockets
- Telephony (SIP/PSTN)

### Platforms

- Web applications
- Mobile apps
- Call centers
- Voice assistants

## Patterns

### OpenAI Realtime API

Native voice-to-voice with GPT-4o

**When to use**: When you want integrated voice AI without separate STT/TTS

import asyncio
import websockets
import json
import base64

OPENAI_API_KEY = "sk-..."

async def voice_session():
    url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"
    headers = {
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "OpenAI-Beta": "realtime=v1"
    }

    async with websockets.connect(url, extra_headers=headers) as ws:
        # Configure session
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["text", "audio"],
                "voice": "alloy",  # alloy, echo, fable, onyx, nova, shimmer
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",  # Voice activity detection
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500
                },
                "tools": [
                    {
                        "type": "function",
                        "name": "get_weather",
                        "description": "Get weather for a location",
                        "parameters": {
                            "type": "object",
                            "properties": {
                                "location": {"type": "string"}
                            }
                        }
                    }
                ]
            }
        }))

        # Send audio (PCM16, 24kHz, mono)
        async def send_audio(audio_bytes):
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is voice-ai-development for?

When should I use voice-ai-development?

Is voice-ai-development safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is voice-ai-development for?

When should I use voice-ai-development?

Is voice-ai-development safe to install?

SKILL.md