Zoom Rtms

Name: Zoom Rtms
Author: anthropics

anthropics/knowledge-work-plugins

1.4k installs
23.1k repo stars
Updated July 28, 2026
anthropics/knowledge-work-plugins

zoom-rtms is an agent skill for reference skill for zoom rtms. use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.

About

The zoom-rtms skill is designed for reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams. Zoom Realtime Media Streams (RTMS) Background reference for live Zoom media pipelines. Prefer build-zoom-bot first, then use this skill for stream types, capabilities, and RTMS-specific implementation constraints. Invoke when the user asks about zoom rtms or related SKILL.md workflows.

Your backend receives and processes live media: audio, video, screen share, chat, transcript.
RTMS is not a frontend UI SDK by itself.
Processing is event-triggered: backend waits for RTMS start webhook events before stream handling begins.
Add a Zoom App SDK frontend for in-client UI/controls.
Stream backend RTMS outputs to frontend via WebSocket (or SSE, gRPC, queue workers, etc.).

Zoom Rtms by the numbers

1,406 all-time installs (skills.sh)
+82 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #266 of 1,896 Design & UI/UX skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

zoom-rtms capabilities & compatibility

Capabilities: your backend receives and processes live media: · rtms is not a frontend ui sdk by itself · processing is event triggered: backend waits for · add a zoom app sdk frontend for in client ui/con
Use cases: frontend

From the docs

What zoom-rtms says it does

Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.

SKILL.md

Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-ce

SKILL.md

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill zoom-rtms

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/anthropics/knowledge-work-plugins/zoom-rtms.svg)](https://skillselion.com/skills/anthropics/knowledge-work-plugins/zoom-rtms)

Installs	1.4k
repo stars	★ 23.1k
Security audit	2 / 3 scanners passed
Last updated	July 28, 2026
Repository	anthropics/knowledge-work-plugins ↗

How do I reference skill for zoom rtms. use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams?

Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.

Who is it for?

Developers using zoom rtms workflows documented in SKILL.md.

Skip if: Skip when the task falls outside zoom-rtms scope or needs a different stack.

When should I use this skill?

User asks about zoom rtms or related SKILL.md workflows.

What you get

Completed zoom-rtms workflow with documented commands, files, and expected deliverables.

RTMS architecture validation
Lifecycle event trigger confirmation
Webhook and connection correction list

By the numbers

5-minute preflight runbook for RTMS architecture and event validation

Files

SKILL.mdMarkdownGitHub ↗

Zoom Realtime Media Streams (RTMS)

Background reference for live Zoom media pipelines. Prefer build-zoom-bot first, then use this skill for stream types, capabilities, and RTMS-specific implementation constraints.

Zoom Realtime Media Streams (RTMS)

Expert guidance for accessing live audio, video, transcript, chat, and screen share data from Zoom meetings, webinars, Video SDK sessions, and Zoom Contact Center Voice in real-time. RTMS uses a WebSocket-based protocol with open standards and does not require a meeting bot to capture the media plane.

Read This First (Critical)

RTMS is primarily a backend media ingestion service.

Your backend receives and processes live media: audio, video, screen share, chat, transcript.
RTMS is not a frontend UI SDK by itself.
Processing is event-triggered: backend waits for RTMS start webhook events before stream handling begins.

Optional architecture (common):

Add a Zoom App SDK frontend for in-client UI/controls.
Stream backend RTMS outputs to frontend via WebSocket (or SSE, gRPC, queue workers, etc.).

Use RTMS for media/data plane, and use frontend frameworks/Zoom Apps for presentation + user interactions.

Official Documentation: https://developers.zoom.us/docs/rtms/ SDK Reference (JS): https://zoom.github.io/rtms/js/ SDK Reference (Python): https://zoom.github.io/rtms/py/ Sample Repository: https://github.com/zoom/rtms-samples

Quick Links

New to RTMS? Follow this path:

1. [Connection Architecture](concepts/connection-architecture.md) - Two-phase WebSocket design 2. [SDK Quickstart](examples/sdk-quickstart.md) - Fastest way to receive media (recommended) 3. [Manual WebSocket](examples/manual-websocket.md) - Full protocol control without SDK 4. [Media Types](references/media-types.md) - Audio, video, transcript, chat, screen share

Complete Implementation:

[RTMS Bot](examples/rtms-bot.md) - End-to-end bot implementation guide

Reference:

[Lifecycle Flow](concepts/lifecycle-flow.md) - Complete webhook-to-streaming flow
[Data Types](references/data-types.md) - All enums and constants
[Webhooks](references/webhooks.md) - Event subscription details
[Environment Variables](references/environment-variables.md) - credential modes and runtime knobs
[Quickstart Notes](references/quickstart.md) - Secondary quickstart guide
Integrated Index - see the section below in this file

Having issues?

Connection fails -> Common Issues
Duplicate connections -> Webhook Gotchas
No audio/video -> Media Configuration
Start with preflight checks -> 5-Minute Runbook

Supported Products

Product	Webhook Event	Payload ID	App Type
Meetings	`meeting.rtms_started` / `meeting.rtms_stopped`	`meeting_uuid`	General App
Webinars	`webinar.rtms_started` / `webinar.rtms_stopped`	`meeting_uuid` (same!)	General App
Video SDK	`session.rtms_started` / `session.rtms_stopped`	`session_id`	Video SDK App
Zoom Contact Center Voice	Product-specific RTMS/ZCC Voice events	Product-specific stream/session identifiers	Contact Center / approved RTMS integration

Once connected, the core signaling/media socket model is shared across products. Meetings, webinars, and Video SDK sessions use the familiar start/stop webhooks. Zoom Contact Center Voice adds its own RTMS/ZCC Voice event family and should be treated as the same transport model with product-specific event payloads.

RTMS Overview

RTMS is a data pipeline that gives your app access to live media from Zoom meetings, webinars, and Video SDK sessions without participant bots. Instead of having automated clients join meetings, use RTMS to collect media data directly from Zoom's infrastructure.

What RTMS Provides

Media Type	Format	Use Cases
Audio	PCM (L16), G.711, G.722, Opus	Transcription, voice analysis, recording
Video	H.264, JPG, PNG	Recording, AI vision, thumbnails, active participant selection
Screen Share	H.264, JPG, PNG	Content capture, slide extraction
Transcript	JSON text	Meeting notes, search, compliance
Chat	JSON text	Archive, sentiment analysis

March 2026 Protocol Changes

Zoom Contact Center Voice support: RTMS now covers Contact Center Voice audio and transcript scenarios.
Transcript Language Identification control: transcript media handshakes now support src_language and enable_lid. Default behavior is LID enabled. Set enable_lid: false to force a fixed language.
Single individual video stream subscription: RTMS can now stream one participant's camera feed at a time when data_opt is set to VIDEO_SINGLE_INDIVIDUAL_STREAM.
Graceful client-initiated shutdown: backends can send STREAM_CLOSE_REQ over the signaling socket and wait for STREAM_CLOSE_RESP.
Media keep-alive tolerance increased: media socket keep-alive timeout is now 65 seconds, not 35.

Two Approaches

Approach	Best For	Complexity
SDK (`@zoom/rtms`)	Most use cases	Low - handles WebSocket complexity
Manual WebSocket	Custom protocols, other languages	High - full protocol implementation

Prerequisites

Node.js 20.3.0+ (24 LTS recommended) for JavaScript SDK
Python 3.10+ for Python SDK
Zoom General App (for meetings/webinars) or Video SDK App (for Video SDK) with RTMS feature enabled
Webhook endpoint for RTMS events
Server to receive WebSocket streams

Need RTMS access? Post in Zoom Developer Forum requesting RTMS access with your use case.

Quick Start (SDK - Recommended)

import rtms from "@zoom/rtms";

// All RTMS start/stop events across products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

// Handle webhook events
rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();

  client.onAudioData((data, timestamp, metadata) => {
    console.log(`Audio from ${metadata.userName}: ${data.length} bytes`);
  });

  client.onTranscriptData((data, timestamp, metadata) => {
    const text = data.toString('utf8');
    console.log(`${metadata.userName}: ${text}`);
  });

  client.onJoinConfirm((reason) => {
    console.log(`Joined session: ${reason}`);
  });

  // SDK handles all WebSocket connections automatically
  // Accepts both meeting_uuid and session_id transparently
  client.join(payload);
});

Quick Start (Manual WebSocket)

For full control or non-SDK languages, implement the two-phase WebSocket protocol:

const WebSocket = require('ws');
const crypto = require('crypto');

const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];

// 1. Generate signature
// For meetings/webinars: uses meeting_uuid. For Video SDK: uses session_id.
function generateSignature(clientId, idValue, streamId, clientSecret) {
  const message = `${clientId},${idValue},${streamId}`;
  return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}

// 2. Handle webhook
app.post('/webhook', (req, res) => {
  res.status(200).send();  // CRITICAL: Respond immediately!
  
  const { event, payload } = req.body;
  if (RTMS_EVENTS.includes(event)) {
    connectToRTMS(payload);
  }
});

// 3. Connect to signaling WebSocket
function connectToRTMS(payload) {
  const { server_urls, rtms_stream_id } = payload;
  // meeting_uuid for meetings/webinars, session_id for Video SDK
  const idValue = payload.meeting_uuid || payload.session_id;
  const signature = generateSignature(CLIENT_ID, idValue, rtms_stream_id, CLIENT_SECRET);
  
  const signalingWs = new WebSocket(server_urls);
  
  signalingWs.on('open', () => {
    signalingWs.send(JSON.stringify({
      msg_type: 1,  // Handshake request
      protocol_version: 1,
      meeting_uuid: idValue,
      rtms_stream_id,
      signature,
      media_type: 9  // AUDIO(1) | TRANSCRIPT(8)
    }));
  });
  
  // ... handle responses, connect to media WebSocket
}

See: Manual WebSocket Guide for complete implementation.

Media Type Bitmask

Combine types with bitwise OR:

Type	Value	Description
Audio	1	PCM audio samples
Video	2	H.264/JPG video frames
Screen Share	4	Separate from video!
Transcript	8	Real-time speech-to-text
Chat	16	In-meeting chat messages
All	32	All media types

Example: Audio + Transcript = 1 | 8 = 9

Critical Gotchas

Issue	Solution
Only 1 connection allowed	New connections kick out existing ones. Track active sessions!
Respond 200 immediately	If webhook delays, Zoom retries creating duplicate connections
Heartbeat mandatory	Respond to msg_type 12 with msg_type 13, or connection dies
Reconnection is YOUR job	RTMS doesn't auto-reconnect. Media keep-alive tolerance is now about 65s; signaling remains around 60s
Transcript language drift	Use `src_language` plus `enable_lid: false` when you want fixed-language transcription instead of automatic language switching
Single participant video only	`VIDEO_SINGLE_INDIVIDUAL_STREAM` supports one participant at a time. A new `VIDEO_SUBSCRIPTION_REQ` overrides the previous selection
Graceful close is explicit now	Use `STREAM_CLOSE_REQ` / `STREAM_CLOSE_RESP` when your backend wants to terminate the stream cleanly

Environment Variables

SDK Environment Variables

# Required - Authentication
ZM_RTMS_CLIENT=your_client_id          # Zoom OAuth Client ID
ZM_RTMS_SECRET=your_client_secret      # Zoom OAuth Client Secret

# Optional - Webhook server
ZM_RTMS_PORT=8080                      # Default: 8080
ZM_RTMS_PATH=/webhook                  # Default: /

# Optional - Logging
ZM_RTMS_LOG_LEVEL=info                 # error, warn, info, debug, trace
ZM_RTMS_LOG_FORMAT=progressive         # progressive or json
ZM_RTMS_LOG_ENABLED=true

Manual Implementation Variables

ZOOM_CLIENT_ID=your_client_id
ZOOM_CLIENT_SECRET=your_client_secret
ZOOM_SECRET_TOKEN=your_webhook_token   # For webhook validation

Zoom App Setup

For Meetings and Webinars (General App)

1. Go to marketplace.zoom.us -> Develop -> Build App 2. Choose General App -> User-Managed 3. Features -> Access -> Enable Event Subscription 4. Add Events -> Search "rtms" -> Select:

meeting.rtms_started
meeting.rtms_stopped
webinar.rtms_started (if using webinars)
webinar.rtms_stopped (if using webinars)

5. Scopes -> Add Scopes -> Search "rtms" -> Add:

meeting:read:meeting_audio
meeting:read:meeting_video
meeting:read:meeting_transcript
meeting:read:meeting_chat
webinar:read:webinar_audio (if using webinars)
webinar:read:webinar_video (if using webinars)
webinar:read:webinar_transcript (if using webinars)
webinar:read:webinar_chat (if using webinars)

For Video SDK (Video SDK App)

1. Go to marketplace.zoom.us -> Develop -> Build App 2. Choose Video SDK App 3. Use your SDK Key and SDK Secret (not OAuth Client ID/Secret) 4. Add Events:

session.rtms_started
session.rtms_stopped

Sample Repositories

Official Samples

Repository	Description
rtms-samples	RTMSManager, boilerplates, AI samples
rtms-quickstart-js	JavaScript SDK quickstart
rtms-quickstart-py	Python SDK quickstart
rtms-sdk-cpp	C++ SDK
zoom-rtms	Main SDK repository

AI Integration Samples

Sample	Description
rtms-meeting-assistant-starter-kit	AI meeting assistant with summaries
arlo-meeting-assistant	Production meeting assistant with DB
videosdk-rtms-transcribe-audio	Whisper transcription

Complete Documentation

Concepts

[Connection Architecture](concepts/connection-architecture.md) - Two-phase WebSocket design
[Lifecycle Flow](concepts/lifecycle-flow.md) - Webhook to streaming flow

Examples

[SDK Quickstart](examples/sdk-quickstart.md) - Using @zoom/rtms SDK
[Manual WebSocket](examples/manual-websocket.md) - Raw protocol implementation
[RTMS Bot](examples/rtms-bot.md) - Complete bot implementation guide
[AI Integration](examples/ai-integration.md) - Transcription and analysis patterns

References

[Media Types](references/media-types.md) - Audio, video, transcript, chat, screen share
[Data Types](references/data-types.md) - All enums and constants
[Connection](references/connection.md) - WebSocket protocol details
[Webhooks](references/webhooks.md) - Event subscription

Troubleshooting

[Common Issues](troubleshooting/common-issues.md) - FAQ and solutions

Resources

Official docs: https://developers.zoom.us/docs/rtms/
Data types: https://developers.zoom.us/docs/rtms/data-types/
Media params: https://developers.zoom.us/docs/rtms/media-parameter-definition/
Developer forum: https://devforum.zoom.us/

---

Need help? Start with Integrated Index section below for complete navigation.

---

Integrated Index

_This section was migrated from SKILL.md._

RTMS provides real-time access to live audio, video, transcript, chat, and screen share from Zoom meetings, webinars, and Video SDK sessions.

Critical Positioning

Treat RTMS as a backend service for receiving and processing media streams.

Backend role: ingest audio/video/share/chat/transcript, run AI/analytics, persist/forward data.
Optional frontend role: Zoom App SDK or web dashboard that consumes processed stream data from backend transport (WebSocket/SSE/other).
Kickoff model: backend waits for RTMS start webhook events, then starts stream processing.

Do not model RTMS as a frontend-only SDK.

Quick Start Path

If you're new to RTMS, follow this order:

1. Run preflight checks first -> RUNBOOK.md 2. Understand the architecture -> concepts/connection-architecture.md

Two-phase WebSocket: Signaling + Media
Why RTMS doesn't use bots

3. Choose your approach -> SDK or Manual

SDK (recommended): examples/sdk-quickstart.md
Manual WebSocket: examples/manual-websocket.md

4. Understand the lifecycle -> concepts/lifecycle-flow.md

Webhook -> Signaling -> Media -> Streaming

5. Configure media types -> references/media-types.md

Audio, video, transcript, chat, screen share

6. Troubleshoot issues -> troubleshooting/common-issues.md

Connection problems, duplicate webhooks, missing data

---

Documentation Structure

rtms/
├── SKILL.md                           # Main skill overview
├── SKILL.md                           # This file - navigation guide
│
├── concepts/                          # Core architectural patterns
│   ├── connection-architecture.md     # Two-phase WebSocket design
│   └── lifecycle-flow.md              # Webhook to streaming flow
│
├── examples/                          # Complete working code
│   ├── sdk-quickstart.md              # Using @zoom/rtms SDK
│   ├── manual-websocket.md            # Raw protocol implementation
│   ├── rtms-bot.md                    # Complete RTMS bot implementation
│   └── ai-integration.md              # Transcription and analysis
│
├── references/                        # Reference documentation
│   ├── media-types.md                 # Audio, video, transcript, chat, share
│   ├── data-types.md                  # All enums and constants
│   ├── connection.md                  # WebSocket protocol details
│   └── webhooks.md                    # Event subscription
│
└── troubleshooting/                   # Problem solving guides
    └── common-issues.md               # FAQ and solutions

---

By Use Case

I want to get meeting transcripts

1. SDK Quickstart - Fastest approach 2. Media Types - Transcript configuration 3. AI Integration - Whisper, Deepgram, AssemblyAI

I want to record meetings

1. Media Types - Audio + Video configuration 2. SDK Quickstart - Receiving media 3. AI Integration - Gap-filled recording

I want to build an AI meeting assistant

1. AI Integration - Complete patterns 2. SDK Quickstart - Media ingestion 3. Lifecycle Flow - Event handling

I want to build a complete RTMS bot

1. RTMS Bot - Complete implementation guide 2. Lifecycle Flow - Webhook to streaming flow 3. Connection Architecture - Two-phase design

I need full protocol control

1. Manual WebSocket - START HERE 2. Connection Architecture - Two-phase design 3. Data Types - All message types and enums 4. Connection - Protocol details

I'm getting connection errors

1. Common Issues - Diagnostic checklist 2. Connection Architecture - Verify flow 3. Webhooks - Validation and timing

I want to understand the architecture

1. Connection Architecture - Two-phase WebSocket 2. Lifecycle Flow - Complete flow diagram 3. Data Types - Protocol constants

---

By Product

I'm building for Zoom Meetings

Standard RTMS setup. Webhook event: meeting.rtms_started. Uses General App with OAuth.
Start with SDK Quickstart or Manual WebSocket.

I'm building for Zoom Webinars

Same as meetings, but webhook event is webinar.rtms_started. Payload still uses meeting_uuid (NOT webinar_uuid).
Add webinar scopes and event subscriptions. See Webhooks.
Only panelist streams are confirmed available. Attendee streams may not be individual.

I'm building for Zoom Video SDK

Webhook event: session.rtms_started. Payload uses session_id (NOT meeting_uuid).
Requires a Video SDK App with SDK Key/Secret (not OAuth Client ID/Secret).
Once connected, the protocol is identical to meetings.
See Webhooks for payload details.

---

Key Documents

1. Connection Architecture (CRITICAL)

[concepts/connection-architecture.md](concepts/connection-architecture.md)

RTMS uses two separate WebSocket connections:

Signaling WebSocket: Authentication, control, heartbeats
Media WebSocket: Actual audio/video/transcript data

2. SDK vs Manual (DECISION POINT)

[examples/sdk-quickstart.md](examples/sdk-quickstart.md) vs [examples/manual-websocket.md](examples/manual-websocket.md)

SDK	Manual
Handles WebSocket complexity	Full protocol control
Automatic reconnection	DIY reconnection
Less code	More code
Best for most use cases	Best for custom requirements

3. Critical Gotchas (MOST COMMON ISSUES)

[troubleshooting/common-issues.md](troubleshooting/common-issues.md)

1. Respond 200 immediately - Delayed webhook responses cause duplicates 2. Only 1 connection per stream - New connections kick out existing 3. Heartbeat required - Must respond to keep-alive or connection dies 4. Track active sessions - Prevent duplicate join attempts

---

Key Learnings

Critical Discoveries:

1. Two-Phase WebSocket Design

Signaling: Control plane (handshake, heartbeat, start/stop)
Media: Data plane (audio, video, transcript, chat, share)
See: Connection Architecture

2. Webhook Response Timing

MUST respond 200 BEFORE any processing
Delayed response -> Zoom retries -> duplicate connections
See: Common Issues

3. Heartbeat is Mandatory

Signaling: Receive msg_type 12, respond with msg_type 13
Media: Same pattern
Failure to respond = connection closed
See: Connection

4. Signature Generation

Format: HMAC-SHA256(clientSecret, "clientId,meetingUuid,streamId")
For Video SDK, use session_id in place of meetingUuid
Webinars still use meeting_uuid (not webinar_uuid)
Required for both signaling and media handshakes
See: Manual WebSocket

5. Media Types are Bitmasks

Audio=1, Video=2, Share=4, Transcript=8, Chat=16, All=32
Combine with OR: Audio+Transcript = 1|8 = 9
See: Media Types

6. Screen Share is SEPARATE from Video

Different msg_type (16 vs 15)
Different media flag (4 vs 2)
Must subscribe separately
See: Media Types

---

Quick Reference

"Connection fails"

-> Common Issues

"Duplicate connections"

-> Webhook timing

"No audio/video data"

-> Media Types - Check configuration

"How do I implement manually?"

-> Manual WebSocket

"What message types exist?"

-> Data Types

"How do I integrate AI?"

-> AI Integration

---

Document Version

Based on Zoom RTMS SDK v1.x and official documentation as of 2026.

---

Happy coding!

Remember: Start with SDK Quickstart for the fastest path, or Manual WebSocket if you need full control.

RTMS Connection Architecture

RTMS uses a two-phase WebSocket design to separate control plane from data plane.

Overview

Multi-Product Note: The two-phase WebSocket design described here is identical for all RTMS products (meetings, webinars, and Video SDK sessions). The only difference is the initial webhook event name and payload ID field. Once connected, the signaling and media protocols are the same.

┌─────────────────────────────────────────────────────────────┐
│                    Zoom Meeting                              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Zoom RTMS Backend                         │
│  ┌─────────────────────┐    ┌─────────────────────────────┐ │
│  │  Signaling Server   │    │     Media Server            │ │
│  │  (Control Plane)    │    │     (Data Plane)            │ │
│  └──────────┬──────────┘    └──────────────┬──────────────┘ │
└─────────────┼───────────────────────────────┼───────────────┘
              │                               │
              ▼                               ▼
┌─────────────────────────────────────────────────────────────┐
│                      Your Server                             │
│  ┌─────────────────────┐    ┌─────────────────────────────┐ │
│  │  Signaling Socket   │    │     Media Socket            │ │
│  │  - Handshake        │    │     - Audio data            │ │
│  │  - Start/Stop       │    │     - Video data            │ │
│  │  - Heartbeat        │    │     - Transcript            │ │
│  └─────────────────────┘    └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Two-Phase Design

Phase 1: Signaling WebSocket (Control Plane)

Purpose: Authentication, session control, heartbeats

Responsibility	Description
Authentication	Validate signature, establish session
Media Server Discovery	Returns media server URL in handshake response
Stream Control	Start/stop streaming commands
Heartbeat	Keep connection alive (msg_type 12/13)
Event Notifications	Participant join/leave, sharing start/stop

URL Source: From server_urls in webhook payload

Message Flow:

Client                          Signaling Server
  │                                    │
  │──── Handshake Request (1) ────────>│
  │<─── Handshake Response (2) ────────│  <- Contains media_server.server_urls
  │                                    │
  │──── Client Ready (7) ─────────────>│  <- After media handshake complete
  │                                    │
  │<─── Keep Alive Request (12) ───────│
  │──── Keep Alive Response (13) ─────>│
  │                                    │

Phase 2: Media WebSocket (Data Plane)

Purpose: Actual audio, video, transcript, chat, screen share data

Responsibility	Description
Media Configuration	Set audio/video parameters (codec, resolution, fps)
Media Streaming	Receive binary media data
Heartbeat	Keep connection alive (msg_type 12/13)

URL Source: From signaling handshake response (media_server.server_urls.all)

Message Flow:

Client                          Media Server
  │                                    │
  │──── Media Handshake Request (3) ──>│  <- With media_params
  │<─── Media Handshake Response (4) ──│
  │                                    │
  │<─── Audio Data (14) ───────────────│
  │<─── Video Data (15) ───────────────│
  │<─── Screen Share Data (16) ────────│
  │<─── Transcript Data (17) ──────────│
  │<─── Chat Data (18) ────────────────│
  │                                    │
  │<─── Keep Alive Request (12) ───────│
  │──── Keep Alive Response (13) ─────>│
  │                                    │

Why Two Connections?

Benefit	Explanation
Separation of Concerns	Control logic doesn't interfere with media streaming
Independent Scaling	Signaling and media servers scale differently
Fault Isolation	Media reconnection doesn't require re-auth
Split Mode Support	Each media type can have its own connection

Connection Modes

Split Mode (Recommended)

Each media type gets its own dedicated WebSocket connection:

Signaling WS ─────┬───> Audio WS
                  ├───> Video WS
                  ├───> Transcript WS
                  └───> Screen Share WS

Advantages:

Independent reconnection per media type
Better reliability
Fault isolation

Unified Mode

One media WebSocket for all media types:

Signaling WS ─────> Media WS (all types)

When to use:

Real-time audio+video muxing where sync matters
Simpler implementation for small projects

Signature Generation

Both signaling and media handshakes require HMAC-SHA256 signature:

// For meetings and webinars: use meeting_uuid
const message = `${clientId},${meetingUuid},${streamId}`;
// For Video SDK: use session_id
const message = `${clientId},${sessionId},${streamId}`;

// Generic approach: use whichever ID is present
const idValue = payload.meeting_uuid || payload.session_id;
const message = `${clientId},${idValue},${streamId}`;
const signature = crypto.createHmac('sha256', clientSecret)
  .update(message)
  .digest('hex');

Important: Webinars use meeting_uuid (not webinar_uuid). Video SDK uses session_id.

Components:

clientId: OAuth Client ID (General App) or SDK Key (Video SDK App)
meetingUuid / sessionId: From webhook payload (meeting_uuid for meetings/webinars, session_id for Video SDK)
streamId: From webhook payload (rtms_stream_id)
clientSecret: OAuth Client Secret (General App) or SDK Secret (Video SDK App)

Heartbeat Protocol

CRITICAL: Both connections require heartbeat responses.

When you receive msg_type: 12 (Keep Alive Request):

// Immediately respond with msg_type: 13
ws.send(JSON.stringify({
  msg_type: 13,
  timestamp: receivedMessage.timestamp
}));

Timeout:

Signaling: ~60 seconds without heartbeat response
Media: ~65 seconds without heartbeat response

Failure to respond = connection closed!

Reconnection

RTMS does NOT auto-reconnect. You must implement:

ws.on('close', (code, reason) => {
  console.log(`Connection closed: ${code} ${reason}`);
  
  // Implement exponential backoff
  setTimeout(() => {
    reconnect();
  }, retryDelay);
  
  retryDelay = Math.min(retryDelay * 2, 30000);
});

Timeouts:

Connection	Reconnection Window
Signaling	60 seconds
Media	65 seconds

Server URL Geo-Routing

Server URLs contain region codes:

Code	Location
`sjc`	San Jose, California
`iad`	Washington DC
`sin`	Singapore
`fra`	Frankfurt, Germany
`syd`	Sydney, Australia

Example: wss://rtms-sjc1.zoom.us/...

For production, route to workers in the same region as the Zoom server for lower latency.

Next Steps

[Lifecycle Flow](lifecycle-flow.md) - Complete webhook-to-streaming sequence
[SDK Quickstart](../examples/sdk-quickstart.md) - SDK handles all this for you
[Manual WebSocket](../examples/manual-websocket.md) - Full protocol implementation

RTMS Lifecycle Flow

Complete flow from meeting/webinar/session start to media streaming.

High-Level Flow

┌─────────────────────────────┐
│  Meeting/Webinar/Session    │
│  Starts                     │
└────────────┬────────────────┘
             │
             ▼
┌─────────────────────────────┐
│  Zoom sends webhook event   │
│  meeting.rtms_started  OR   │
│  webinar.rtms_started  OR   │
│  session.rtms_started       │
└────────────┬────────────────┘
         │
         ▼
┌──────────────────┐
│  Your server     │
│  receives        │
│  webhook         │
│                  │
│  RESPOND 200     │
│  IMMEDIATELY!    │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Connect to      │
│  Signaling WS    │
│                  │
│  Send handshake  │
│  (msg_type: 1)   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Receive         │
│  handshake resp  │
│  (msg_type: 2)   │
│                  │
│  Extract media   │
│  server URL      │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Connect to      │
│  Media WS        │
│                  │
│  Send handshake  │
│  (msg_type: 3)   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Receive media   │
│  handshake resp  │
│  (msg_type: 4)   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Send Client     │
│  Ready to        │
│  Signaling       │
│  (msg_type: 7)   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Receive media   │
│  data:           │
│  - Audio (14)    │
│  - Video (15)    │
│  - Share (16)    │
│  - Transcript(17)│
│  - Chat (18)     │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Respond to      │
│  heartbeats      │
│  (12 -> 13)      │
└────────┬─────────┘
         │
         ▼
┌─────────────────────────────┐
│  Optional control-plane     │
│  actions during stream      │
│  - EVENT_SUBSCRIPTION       │
│  - VIDEO_SUBSCRIPTION_REQ   │
│  - STREAM_CLOSE_REQ         │
└────────────┬────────────────┘
             │
             ▼
┌─────────────────────────────┐
│  meeting/webinar/session    │
│  .rtms_stopped              │
│                             │
│  Close sockets              │
│  Cleanup                    │
└─────────────────────────────┘

Detailed Steps

Step 1: Receive Webhook

When RTMS starts, Zoom sends a webhook. The event name and payload differ by product:

Meeting RTMS:

{
  "event": "meeting.rtms_started",
  "payload": {
    "account_id": "abc123",
    "object": {
      "meeting_id": "123456789",
      "meeting_uuid": "AbC123...",
      "host_id": "user123",
      "rtms_stream_id": "stream123==",
      "server_urls": "wss://rtms-sjc1.zoom.us/...",
      "signature": "pre_computed_signature"
    }
  }
}

Webinar RTMS:

{
  "event": "webinar.rtms_started",
  "payload": {
    "account_id": "abc123",
    "object": {
      "meeting_id": "123456789",
      "meeting_uuid": "AbC123...",
      "host_id": "user123",
      "rtms_stream_id": "stream123==",
      "server_urls": "wss://rtms-sjc1.zoom.us/...",
      "signature": "pre_computed_signature"
    }
  }
}

Note: Webinar payloads use meeting_uuid, NOT webinar_uuid.

Video SDK RTMS:

{
  "event": "session.rtms_started",
  "payload": {
    "account_id": "abc123",
    "object": {
      "session_id": "SessionABC...",
      "rtms_stream_id": "stream123==",
      "server_urls": "wss://rtms-sjc1.zoom.us/...",
      "signature": "pre_computed_signature"
    }
  }
}

Note: Video SDK payloads use session_id instead of meeting_uuid.

Product Differences

Aspect	Meetings	Webinars	Video SDK
Webhook event	`meeting.rtms_started`	`webinar.rtms_started`	`session.rtms_started`
Payload ID field	`meeting_uuid`	`meeting_uuid` (same!)	`session_id`
App type	General App (OAuth)	General App (OAuth)	Video SDK App (SDK Key/Secret)
Participants	All participants	Panelists have full streams; attendees may not	All participants
Protocol after connect	Identical	Identical	Identical

CRITICAL: Respond with HTTP 200 IMMEDIATELY before any processing!

const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];

app.post('/webhook', (req, res) => {
  res.status(200).send();  // FIRST!
  
  const { event, payload } = req.body;
  if (RTMS_EVENTS.includes(event)) {
    handleRTMSStarted(payload);
  }
});

Why? If you delay, Zoom retries the webhook. The retry creates a second connection, which kicks out your first connection.

Step 2: Connect to Signaling WebSocket

const signalingWs = new WebSocket(payload.server_urls);

// Use meeting_uuid for meetings/webinars, session_id for Video SDK
const idValue = payload.meeting_uuid || payload.session_id;

signalingWs.on('open', () => {
  const signature = generateSignature(
    CLIENT_ID, 
    idValue, 
    payload.rtms_stream_id, 
    CLIENT_SECRET
  );

  signalingWs.send(JSON.stringify({
    msg_type: 1,                    // Handshake request
    protocol_version: 1,
    meeting_uuid: idValue,
    rtms_stream_id: payload.rtms_stream_id,
    signature: signature,
    media_type: 9                   // Audio(1) + Transcript(8)
  }));
});

Step 3: Handle Signaling Response

signalingWs.on('message', (data) => {
  const msg = JSON.parse(data);
  
  switch (msg.msg_type) {
    case 2:  // Handshake response
      if (msg.status_code === 0) {
        // Extract media server URL
        const mediaUrl = msg.media_server.server_urls.all;
        connectToMediaServer(mediaUrl);
      } else {
        console.error('Handshake failed:', msg.status_code);
      }
      break;
      
    case 12:  // Keep alive request
      signalingWs.send(JSON.stringify({
        msg_type: 13,
        timestamp: msg.timestamp
      }));
      break;
  }
});

Step 4: Connect to Media WebSocket

function connectToMediaServer(mediaUrl) {
  const mediaWs = new WebSocket(mediaUrl);
  
  mediaWs.on('open', () => {
    mediaWs.send(JSON.stringify({
      msg_type: 3,                  // Media handshake request
      protocol_version: 1,
      meeting_uuid: idValue,        // meeting_uuid or session_id
      rtms_stream_id: streamId,
      signature: signature,
      media_type: 9,                // Audio + Transcript
      payload_encryption: false,
      media_params: {
        audio: {
          content_type: 2,          // RAW_AUDIO
          sample_rate: 1,           // 16kHz
          channel: 1,               // Mono
          codec: 1,                 // L16 (PCM)
          data_opt: 1,              // Mixed stream
          send_rate: 20             // 20ms chunks
        },
        transcript: {
          content_type: 5,          // TEXT
          src_language: 9,          // English
          enable_lid: false         // Fixed language, no auto-switch
        }
      }
    }));
  });
}

Step 5: Start Streaming

After media handshake succeeds, tell signaling you're ready:

mediaWs.on('message', (data) => {
  const msg = JSON.parse(data);
  
  if (msg.msg_type === 4 && msg.status_code === 0) {
    // Media handshake success - tell signaling we're ready
    signalingWs.send(JSON.stringify({
      msg_type: 7,                  // Client ready
      rtms_stream_id: streamId
    }));
  }
});

Step 6: Receive Media Data

mediaWs.on('message', (data) => {
  const msg = JSON.parse(data);
  
  switch (msg.msg_type) {
    case 14:  // Audio
      const audioBuffer = Buffer.from(msg.content, 'base64');
      processAudio(audioBuffer, msg.user_name, msg.timestamp);
      break;
      
    case 15:  // Video
      const videoBuffer = Buffer.from(msg.content, 'base64');
      processVideo(videoBuffer, msg.user_name, msg.timestamp);
      break;
      
    case 16:  // Screen share
      const shareBuffer = Buffer.from(msg.content, 'base64');
      processScreenShare(shareBuffer, msg.user_name, msg.timestamp);
      break;
      
    case 17:  // Transcript
      console.log(`${msg.user_name}: ${msg.content}`);
      break;
      
    case 18:  // Chat
      console.log(`[Chat] ${msg.user_name}: ${msg.content}`);
      break;
      
    case 12:  // Keep alive
      mediaWs.send(JSON.stringify({
        msg_type: 13,
        timestamp: msg.timestamp
      }));
      break;
  }
});

Step 6A: Track Available Participant Video Streams

When using the new single-individual-video mode, the signaling socket tells you whose camera is currently available.

const activeVideoUsers = new Set();

function handleEventUpdate(msg) {
  const eventType = msg.event?.event_type;
  const participants = msg.event?.participants || [];

  if (eventType === 8) { // PARTICIPANT_VIDEO_ON
    for (const participant of participants) activeVideoUsers.add(participant.user_id);
  }

  if (eventType === 9) { // PARTICIPANT_VIDEO_OFF
    for (const participant of participants) activeVideoUsers.delete(participant.user_id);
  }
}

Use these events as the control-plane signal for which participant video streams are currently subscribable.

Step 6B: Select One Participant Video Stream

function subscribeToParticipantVideo(streamId, userId) {
  const signalingWs = signalingConnections.get(streamId);
  if (!signalingWs) return;

  signalingWs.send(JSON.stringify({
    msg_type: 28, // VIDEO_SUBSCRIPTION_REQ
    user_id: userId,
    subscribe: true,
    timestamp: Date.now()
  }));
}

Important constraint:

only one participant stream can be active at a time
the newest successful subscription replaces the previous selection

Step 7: Handle Session End

const RTMS_STOP_EVENTS = ['meeting.rtms_stopped', 'webinar.rtms_stopped', 'session.rtms_stopped'];

// Via webhook
app.post('/webhook', (req, res) => {
  res.status(200).send();
  
  const { event, payload } = req.body;
  
  if (RTMS_STOP_EVENTS.includes(event)) {
    const streamId = payload.rtms_stream_id;
    
    // Close connections
    signalingConnections.get(streamId)?.close();
    mediaConnections.get(streamId)?.close();
    
    // Cleanup
    signalingConnections.delete(streamId);
    mediaConnections.delete(streamId);
  }
});

// Also handle WebSocket close events
signalingWs.on('close', (code, reason) => {
  console.log('Signaling closed:', code, reason);
  // Implement reconnection if needed
});

Optional: Client-Initiated Graceful Close

The backend can now ask RTMS to terminate the stream cleanly:

function closeStream(streamId) {
  const signalingWs = signalingConnections.get(streamId);
  if (!signalingWs) return;

  signalingWs.send(JSON.stringify({
    msg_type: 21, // STREAM_CLOSE_REQ
    rtms_stream_id: streamId
  }));
}

Expect a STREAM_CLOSE_RESP followed by normal socket teardown.

Session Tracking

CRITICAL: Track active sessions to prevent duplicate connections!

const activeSessions = new Map();

function handleRTMSStarted(payload) {
  const streamId = payload.rtms_stream_id;
  
  // Check for existing connection
  if (activeSessions.has(streamId)) {
    console.log('Already connected to this stream, ignoring');
    return;
  }
  
  // Mark as active (meeting_uuid for meetings/webinars, session_id for Video SDK)
  activeSessions.set(streamId, {
    startTime: Date.now(),
    idValue: payload.meeting_uuid || payload.session_id
  });
  
  // Connect
  connectToRTMS(payload);
}

function handleRTMSStopped(payload) {
  const streamId = payload.rtms_stream_id;
  activeSessions.delete(streamId);
  // ... cleanup
}

Error Handling

// SDK state management (from Arlo sample)
try {
  client.join(payload);
} catch (error) {
  if (error.message?.includes('Invalid status')) {
    console.warn('SDK in invalid state, waiting to retry...');
    
    setTimeout(() => {
      handleRTMSStarted(payload);
    }, 2000);
  }
}

Next Steps

[SDK Quickstart](../examples/sdk-quickstart.md) - SDK handles all this automatically
[Manual WebSocket](../examples/manual-websocket.md) - Full implementation code
[Common Issues](../troubleshooting/common-issues.md) - Debugging connection problems

AI Integration Patterns

Patterns for integrating RTMS with AI services for transcription, analysis, and meeting assistants. These examples work with meetings, webinars, and Video SDK sessions.

Audio Transcription with External Services

Deepgram Integration

import rtms from "@zoom/rtms";
import { createClient } from "@deepgram/sdk";

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const deepgram = createClient(process.env.DEEPGRAM_API_KEY);

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  
  // Configure for Deepgram-compatible audio
  client.setAudioParams({
    codec: 1,          // L16 (PCM)
    sampleRate: 1,     // 16kHz
    channel: 1,        // Mono
    dataOpt: 1         // Mixed stream
  });

  // Create live transcription connection
  const connection = deepgram.listen.live({
    model: "nova-2",
    language: "en",
    smart_format: true,
    punctuate: true,
  });

  connection.on("Results", (data) => {
    const transcript = data.channel.alternatives[0].transcript;
    if (transcript) {
      console.log(`[Deepgram]: ${transcript}`);
    }
  });

  client.onAudioData((buffer, timestamp, metadata) => {
    // Send audio to Deepgram
    connection.send(buffer);
  });

  client.onLeave(() => {
    connection.finish();
  });

  client.join(payload);
});

AssemblyAI Integration

import rtms from "@zoom/rtms";
import { AssemblyAI } from "assemblyai";

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const aai = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  
  client.setAudioParams({
    codec: 1,          // L16 (PCM)
    sampleRate: 1,     // 16kHz
    channel: 1         // Mono
  });

  const transcriber = aai.realtime.createService({
    sampleRate: 16000,
  });

  transcriber.connect();

  transcriber.on("transcript", (transcript) => {
    if (transcript.text) {
      console.log(`[AssemblyAI]: ${transcript.text}`);
    }
  });

  client.onAudioData((buffer, timestamp, metadata) => {
    transcriber.sendAudio(buffer);
  });

  client.onLeave(() => {
    transcriber.close();
  });

  client.join(payload);
});

Whisper (Local) Integration

import rtms from "@zoom/rtms";
import { Whisper } from "whisper-node";

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const whisper = new Whisper("base.en");

let audioBuffer = Buffer.alloc(0);
const BUFFER_SIZE = 16000 * 10; // 10 seconds at 16kHz

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  
  client.setAudioParams({
    codec: 1,          // L16 (PCM)
    sampleRate: 1,     // 16kHz
    channel: 1         // Mono
  });

  client.onAudioData(async (buffer, timestamp, metadata) => {
    // Accumulate audio
    audioBuffer = Buffer.concat([audioBuffer, buffer]);
    
    // Transcribe when buffer is full
    if (audioBuffer.length >= BUFFER_SIZE) {
      const transcript = await whisper.transcribe(audioBuffer);
      console.log(`[Whisper]: ${transcript}`);
      audioBuffer = Buffer.alloc(0);
    }
  });

  client.join(payload);
});

Meeting Summarization

OpenAI/GPT Integration

import rtms from "@zoom/rtms";
import OpenAI from "openai";

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const transcripts = [];
let summaryInterval;

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();

  client.onTranscriptData((buffer, timestamp, metadata) => {
    const text = buffer.toString('utf8');
    transcripts.push({
      speaker: metadata.userName,
      text: text,
      time: new Date(timestamp)
    });
  });

  // Generate summary every 5 minutes
  summaryInterval = setInterval(async () => {
    if (transcripts.length === 0) return;

    const fullTranscript = transcripts
      .map(t => `${t.speaker}: ${t.text}`)
      .join('\n');

    const summary = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: "Summarize this meeting transcript. Include key points, decisions, and action items."
        },
        {
          role: "user",
          content: fullTranscript
        }
      ]
    });

    console.log("Meeting Summary:", summary.choices[0].message.content);
  }, 5 * 60 * 1000);

  client.onLeave(async () => {
    clearInterval(summaryInterval);
    
    // Generate final summary
    const fullTranscript = transcripts
      .map(t => `${t.speaker}: ${t.text}`)
      .join('\n');

    const summary = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: `Create a comprehensive meeting summary with:
- Key topics discussed
- Decisions made
- Action items with owners
- Follow-up items`
        },
        {
          role: "user",
          content: fullTranscript
        }
      ]
    });

    console.log("Final Summary:", summary.choices[0].message.content);
  });

  client.join(payload);
});

Real-Time Sentiment Analysis

import rtms from "@zoom/rtms";

async function analyzeSentiment(text) {
  // Use any sentiment API (OpenAI, HuggingFace, etc.)
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-3.5-turbo',
      messages: [{
        role: 'user',
        content: `Analyze sentiment (positive/neutral/negative): "${text}"`
      }]
    })
  });
  
  const data = await response.json();
  return data.choices[0].message.content;
}

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  let recentTranscripts = [];

  client.onTranscriptData(async (buffer, timestamp, metadata) => {
    const text = buffer.toString('utf8');
    recentTranscripts.push(text);

    // Analyze every 10 segments
    if (recentTranscripts.length >= 10) {
      const combinedText = recentTranscripts.join(' ');
      const sentiment = await analyzeSentiment(combinedText);
      console.log(`Sentiment: ${sentiment}`);
      recentTranscripts = [];
    }
  });

  client.join(payload);
});

Audio Recording with Gap Filling

For continuous playback, fill audio gaps with silence:

import rtms from "@zoom/rtms";
import fs from 'fs';

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const SAMPLE_RATE = 16000;
const BYTES_PER_SAMPLE = 2; // 16-bit
const MS_PER_FRAME = 20;
const BYTES_PER_FRAME = SAMPLE_RATE * BYTES_PER_SAMPLE * MS_PER_FRAME / 1000;

function generateSilentFrame(durationMs) {
  const samples = SAMPLE_RATE * durationMs / 1000;
  return Buffer.alloc(samples * BYTES_PER_SAMPLE);
}

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  const streamId = payload.rtms_stream_id;
  
  const audioStream = fs.createWriteStream(`recordings/${streamId}.pcm`);
  let lastTimestamp = null;

  client.setAudioParams({
    codec: 1,          // L16 (PCM)
    sampleRate: 1,     // 16kHz
    channel: 1,        // Mono
    dataOpt: 1,        // Mixed stream
    duration: 20       // 20ms chunks
  });

  client.onAudioData((buffer, timestamp, metadata) => {
    if (lastTimestamp !== null) {
      const gap = timestamp - lastTimestamp;
      
      // Fill gaps >= 500ms with silence
      if (gap >= 500) {
        const silentFrames = Math.floor(gap / MS_PER_FRAME);
        console.log(`Gap detected: ${gap}ms, filling ${silentFrames} frames`);
        
        for (let i = 0; i < silentFrames; i++) {
          audioStream.write(generateSilentFrame(MS_PER_FRAME));
        }
      }
    }
    
    lastTimestamp = timestamp;
    audioStream.write(buffer);
  });

  client.onLeave(() => {
    audioStream.end();
    console.log(`Recording saved: recordings/${streamId}.pcm`);
  });

  client.join(payload);
});

Multi-Format Transcript Output

Generate VTT, SRT, and TXT simultaneously:

import rtms from "@zoom/rtms";
import fs from 'fs';

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

function formatVttTimestamp(ms) {
  const s = Math.floor(ms / 1000);
  const m = Math.floor(s / 60);
  const h = Math.floor(m / 60);
  const msec = ms % 1000;
  return `${String(h).padStart(2, '0')}:${String(m % 60).padStart(2, '0')}:${String(s % 60).padStart(2, '0')}.${String(msec).padStart(3, '0')}`;
}

function formatSrtTimestamp(ms) {
  return formatVttTimestamp(ms).replace('.', ',');
}

rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();
  const streamId = payload.rtms_stream_id;
  
  const baseDir = `recordings/${streamId}`;
  fs.mkdirSync(baseDir, { recursive: true });
  
  fs.writeFileSync(`${baseDir}/transcript.vtt`, 'WEBVTT\n\n');
  let srtIndex = 1;
  let startTimestamp = null;

  client.onTranscriptData((buffer, timestamp, metadata) => {
    const text = buffer.toString('utf8');
    const userName = metadata.userName;
    
    if (startTimestamp === null) {
      startTimestamp = timestamp;
    }
    
    const relative = timestamp - startTimestamp;
    const endTime = relative + 2000; // 2 second duration
    
    // VTT format
    const vttLine = `${formatVttTimestamp(relative)} --> ${formatVttTimestamp(endTime)}\n${userName}: ${text}\n\n`;
    fs.appendFileSync(`${baseDir}/transcript.vtt`, vttLine);
    
    // SRT format
    const srtLine = `${srtIndex++}\n${formatSrtTimestamp(relative)} --> ${formatSrtTimestamp(endTime)}\n${userName}: ${text}\n\n`;
    fs.appendFileSync(`${baseDir}/transcript.srt`, srtLine);
    
    // Plain text
    const txtLine = `[${new Date(timestamp).toISOString()}] ${userName}: ${text}\n`;
    fs.appendFileSync(`${baseDir}/transcript.txt`, txtLine);
  });

  client.join(payload);
});

Environment Variables

# Zoom RTMS
ZM_RTMS_CLIENT=your_client_id
ZM_RTMS_SECRET=your_client_secret

# AI Services
OPENAI_API_KEY=sk-...
DEEPGRAM_API_KEY=...
ASSEMBLYAI_API_KEY=...

# OpenRouter (free models)
OPENROUTER_API_KEY=sk-or-...

Free AI Model Considerations

When using free models (Gemma, Qwen, DeepSeek via OpenRouter):

Limitation	Impact	Solution
No image support	Can't analyze screen shares	Use paid model or skip image analysis
Context limits	Long transcripts may fail	Chunk transcripts, summarize incrementally
Rate limiting	May get 429 errors	Implement retry with backoff, stagger requests

Recommended for production: OpenRouter with google/gemini-2.5-pro - supports vision + XML tagging.

Next Steps

[SDK Quickstart](sdk-quickstart.md) - Basic RTMS setup
[Manual WebSocket](manual-websocket.md) - Protocol details
[Media Types](../references/media-types.md) - Audio/video configuration

Manual WebSocket Implementation

Full RTMS protocol implementation without the SDK. Use this for:

Languages without SDK support
Custom protocol requirements
Learning the underlying protocol

Overview

RTMS requires two WebSocket connections: 1. Signaling WebSocket - Control plane (handshake, heartbeat, start/stop) 2. Media WebSocket - Data plane (audio, video, transcript, chat, share)

Complete Implementation

const WebSocket = require('ws');
const crypto = require('crypto');
const express = require('express');

const app = express();
app.use(express.json());

// Configuration
const CLIENT_ID = process.env.ZOOM_CLIENT_ID;
const CLIENT_SECRET = process.env.ZOOM_CLIENT_SECRET;
const SECRET_TOKEN = process.env.ZOOM_SECRET_TOKEN;

// Active connections
const signalingConnections = new Map();
const mediaConnections = new Map();
const activeSessions = new Map();
const activeVideoUsers = new Map();

// ============================================
// SIGNATURE GENERATION
// Uses meeting_uuid for meetings/webinars, session_id for Video SDK
// ============================================

function generateSignature(clientId, idValue, streamId, clientSecret) {
  const message = `${clientId},${idValue},${streamId}`;
  return crypto.createHmac('sha256', clientSecret)
    .update(message)
    .digest('hex');
}

// ============================================
// WEBHOOK HANDLER
// ============================================

const RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started'];
const RTMS_STOP_EVENTS = ['meeting.rtms_stopped', 'webinar.rtms_stopped', 'session.rtms_stopped'];

app.post('/webhook', (req, res) => {
  // CRITICAL: Respond 200 IMMEDIATELY before any processing!
  res.status(200).send();
  
  const { event, payload } = req.body;
  
  // Handle URL validation challenge
  if (event === 'endpoint.url_validation') {
    const hash = crypto
      .createHmac('sha256', SECRET_TOKEN)
      .update(payload.plainToken)
      .digest('hex');
    return res.json({ 
      plainToken: payload.plainToken, 
      encryptedToken: hash 
    });
  }
  
  // Handle RTMS events (meetings, webinars, and Video SDK)
  if (RTMS_EVENTS.includes(event)) {
    handleRTMSStarted(payload.object);
  } else if (RTMS_STOP_EVENTS.includes(event)) {
    handleRTMSStopped(payload.object);
  }
});

// ============================================
// RTMS START HANDLER
// ============================================

function handleRTMSStarted(payload) {
  const { rtms_stream_id, server_urls } = payload;
  // meeting_uuid for meetings/webinars, session_id for Video SDK
  const idValue = payload.meeting_uuid || payload.session_id;
  
  // Prevent duplicate connections
  if (activeSessions.has(rtms_stream_id)) {
    console.log('Already connected to this stream, ignoring');
    return;
  }
  
  activeSessions.set(rtms_stream_id, {
    idValue: idValue,
    startTime: Date.now()
  });
  
  connectToSignaling(idValue, rtms_stream_id, server_urls);
}

// ============================================
// SIGNALING WEBSOCKET
// ============================================

function connectToSignaling(idValue, streamId, serverUrl) {
  console.log('Connecting to signaling:', serverUrl);
  
  const signature = generateSignature(CLIENT_ID, idValue, streamId, CLIENT_SECRET);
  const ws = new WebSocket(serverUrl);
  
  signalingConnections.set(streamId, ws);
  
  ws.on('open', () => {
    console.log('Signaling connected, sending handshake');
    
    ws.send(JSON.stringify({
      msg_type: 1,                    // SIGNALING_HAND_SHAKE_REQ
      protocol_version: 1,
      meeting_uuid: idValue,          // Works for both meeting_uuid and session_id
      rtms_stream_id: streamId,
      sequence: Math.floor(Math.random() * 1000000),
      signature: signature,
      media_type: 9                   // AUDIO(1) | TRANSCRIPT(8)
    }));
  });
  
  ws.on('message', (data) => {
    const msg = JSON.parse(data.toString());
    handleSignalingMessage(msg, idValue, streamId);
  });
  
  ws.on('close', (code, reason) => {
    console.log('Signaling closed:', code, reason.toString());
    signalingConnections.delete(streamId);
    // Implement reconnection logic if needed
  });
  
  ws.on('error', (error) => {
    console.error('Signaling error:', error);
  });
}

function handleSignalingMessage(msg, idValue, streamId) {
  switch (msg.msg_type) {
    case 2:  // SIGNALING_HAND_SHAKE_RESP
      if (msg.status_code === 0) {
        console.log('Signaling handshake success');
        
        // Extract media server URL and connect
        const mediaUrl = msg.media_server.server_urls.all;
        connectToMedia(idValue, streamId, mediaUrl);
      } else {
        console.error('Signaling handshake failed:', msg.status_code);
      }
      break;
      
    case 6:  // EVENT_UPDATE
      handleEventUpdate(msg, streamId);
      break;
      
    case 8:  // STREAM_STATE_UPDATE
      console.log('Stream state:', msg.state);
      break;
      
    case 9:  // SESSION_STATE_UPDATE
      console.log('Session state:', msg.state);
      break;
      
    case 12:  // KEEP_ALIVE_REQ
      const signalingWs = signalingConnections.get(streamId);
      if (signalingWs) {
        signalingWs.send(JSON.stringify({
          msg_type: 13,               // KEEP_ALIVE_RESP
          timestamp: msg.timestamp
        }));
      }
      break;
  }
}

function handleEventUpdate(msg, streamId) {
  const eventType = msg.event?.event_type ?? msg.event_type;
  const participants = msg.event?.participants ?? [];

  switch (eventType) {
    case 2:  // ACTIVE_SPEAKER_CHANGE
      console.log('Active speaker:', msg.user_name);
      break;
    case 3:  // PARTICIPANT_JOIN
      console.log('Participant joined:', msg.user_name);
      break;
    case 4:  // PARTICIPANT_LEAVE
      console.log('Participant left:', msg.user_name);
      break;
    case 5:  // SHARING_START
      console.log('Sharing started by:', msg.user_name);
      break;
    case 6:  // SHARING_STOP
      console.log('Sharing stopped');
      break;
    case 8:  // PARTICIPANT_VIDEO_ON
      for (const participant of participants) {
        const set = activeVideoUsers.get(streamId) || new Set();
        set.add(participant.user_id);
        activeVideoUsers.set(streamId, set);
      }
      break;
    case 9:  // PARTICIPANT_VIDEO_OFF
      for (const participant of participants) {
        activeVideoUsers.get(streamId)?.delete(participant.user_id);
      }
      break;
  }
}

// ============================================
// MEDIA WEBSOCKET
// ============================================

function connectToMedia(idValue, streamId, mediaUrl) {
  console.log('Connecting to media:', mediaUrl);
  
  const signature = generateSignature(CLIENT_ID, idValue, streamId, CLIENT_SECRET);
  const ws = new WebSocket(mediaUrl);
  
  mediaConnections.set(streamId, ws);
  
  ws.on('open', () => {
    console.log('Media connected, sending handshake');
    
    ws.send(JSON.stringify({
      msg_type: 3,                    // DATA_HAND_SHAKE_REQ
      protocol_version: 1,
      meeting_uuid: idValue,          // Works for both meeting_uuid and session_id
      rtms_stream_id: streamId,
      signature: signature,
      media_type: 9,                  // AUDIO(1) | TRANSCRIPT(8)
      payload_encryption: false,
      media_params: {
        audio: {
          content_type: 2,            // RAW_AUDIO
          sample_rate: 1,             // 16kHz
          channel: 1,                 // Mono
          codec: 1,                   // L16 (PCM)
          data_opt: 1,                // Mixed stream
          send_rate: 20               // 20ms chunks
        },
        transcript: {
          content_type: 5,            // TEXT
          src_language: 9,            // English
          enable_lid: false           // Fixed language, no auto-switch
        }
      }
    }));
  });
  
  ws.on('message', (data) => {
    const msg = JSON.parse(data.toString());
    handleMediaMessage(msg, streamId);
  });
  
  ws.on('close', (code, reason) => {
    console.log('Media closed:', code, reason.toString());
    mediaConnections.delete(streamId);
  });
  
  ws.on('error', (error) => {
    console.error('Media error:', error);
  });
}

function handleMediaMessage(msg, streamId) {
  switch (msg.msg_type) {
    case 4:  // DATA_HAND_SHAKE_RESP
      if (msg.status_code === 0) {
        console.log('Media handshake success, sending client ready');
        
        // Tell signaling we're ready to receive
        const signalingWs = signalingConnections.get(streamId);
        if (signalingWs) {
          signalingWs.send(JSON.stringify({
            msg_type: 7,              // CLIENT_READY_ACK
            rtms_stream_id: streamId
          }));
        }
      } else {
        console.error('Media handshake failed:', msg.status_code);
      }
      break;
      
    case 12:  // KEEP_ALIVE_REQ
      const mediaWs = mediaConnections.get(streamId);
      if (mediaWs) {
        mediaWs.send(JSON.stringify({
          msg_type: 13,               // KEEP_ALIVE_RESP
          timestamp: msg.timestamp
        }));
      }
      break;
      
    case 14:  // MEDIA_DATA_AUDIO
      handleAudioData(msg);
      break;
      
    case 15:  // MEDIA_DATA_VIDEO
      handleVideoData(msg);
      break;
      
    case 16:  // MEDIA_DATA_SHARE
      handleShareData(msg);
      break;
      
    case 17:  // MEDIA_DATA_TRANSCRIPT
      handleTranscriptData(msg);
      break;
      
    case 18:  // MEDIA_DATA_CHAT
      handleChatData(msg);
      break;
  }
}

// ============================================
// MEDIA DATA HANDLERS
// ============================================

function handleAudioData(msg) {
  const audioBuffer = Buffer.from(msg.content, 'base64');
  console.log(`Audio: ${audioBuffer.length} bytes from ${msg.user_name || 'mixed'}`);
  
  // Process audio:
  // - Send to transcription service
  // - Save to file
  // - Stream to output
}

function handleVideoData(msg) {
  const videoBuffer = Buffer.from(msg.content, 'base64');
  console.log(`Video: ${videoBuffer.length} bytes from ${msg.user_name}`);
  
  // Process video:
  // - Decode H.264/JPG
  // - Save frames
  // - AI analysis
}

function handleShareData(msg) {
  const shareBuffer = Buffer.from(msg.content, 'base64');
  console.log(`Share: ${shareBuffer.length} bytes from ${msg.user_name}`);
}

function handleTranscriptData(msg) {
  console.log(`[${msg.user_name}]: ${msg.content}`);
  
  // Save transcript, process with AI, etc.
}

function handleChatData(msg) {
  console.log(`[Chat] ${msg.user_name}: ${msg.content}`);
}

// ============================================
// RTMS STOP HANDLER
// ============================================

function handleRTMSStopped(payload) {
  const streamId = payload.rtms_stream_id;
  
  console.log('RTMS stopped:', streamId);
  
  // Close connections
  const signalingWs = signalingConnections.get(streamId);
  const mediaWs = mediaConnections.get(streamId);
  
  if (signalingWs) signalingWs.close();
  if (mediaWs) mediaWs.close();
  
  // Cleanup
  signalingConnections.delete(streamId);
  mediaConnections.delete(streamId);
  activeSessions.delete(streamId);
}

// ============================================
// START SERVER
// ============================================

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(`RTMS server running on port ${PORT}`);
});

Message Type Reference

Signaling Messages

msg_type	Name	Direction	Description
1	SIGNALING_HAND_SHAKE_REQ	Client -> Server	Initial handshake
2	SIGNALING_HAND_SHAKE_RESP	Server -> Client	Handshake response with media URL
5	EVENT_SUBSCRIPTION	Client -> Server	Subscribe to events
6	EVENT_UPDATE	Server -> Client	Event notification
7	CLIENT_READY_ACK	Client -> Server	Ready to receive media
8	STREAM_STATE_UPDATE	Server -> Client	Stream state changed
9	SESSION_STATE_UPDATE	Server -> Client	Session state changed
12	KEEP_ALIVE_REQ	Server -> Client	Heartbeat ping
13	KEEP_ALIVE_RESP	Client -> Server	Heartbeat pong

Media Messages

msg_type	Name	Direction	Description
3	DATA_HAND_SHAKE_REQ	Client -> Server	Media handshake with params
4	DATA_HAND_SHAKE_RESP	Server -> Client	Media handshake response
12	KEEP_ALIVE_REQ	Server -> Client	Heartbeat ping
13	KEEP_ALIVE_RESP	Client -> Server	Heartbeat pong
14	MEDIA_DATA_AUDIO	Server -> Client	Audio data
15	MEDIA_DATA_VIDEO	Server -> Client	Video data
16	MEDIA_DATA_SHARE	Server -> Client	Screen share data
17	MEDIA_DATA_TRANSCRIPT	Server -> Client	Transcript data
18	MEDIA_DATA_CHAT	Server -> Client	Chat message

Media Parameters

Audio Parameters

{
  content_type: 2,     // 1=RTP, 2=RAW_AUDIO
  sample_rate: 1,      // 0=8kHz, 1=16kHz, 2=32kHz, 3=48kHz
  channel: 1,          // 1=Mono, 2=Stereo (OPUS only)
  codec: 1,            // 1=L16, 2=G.711, 3=G.722, 4=OPUS
  data_opt: 1,         // 1=Mixed, 2=Multi-streams
  send_rate: 20        // Chunk size in ms (multiple of 20)
}

function subscribeToParticipantVideo(streamId, userId) {
  const signalingWs = signalingConnections.get(streamId);
  if (!signalingWs) return;

  signalingWs.send(JSON.stringify({
    msg_type: 28, // VIDEO_SUBSCRIPTION_REQ
    user_id: userId,
    subscribe: true,
    timestamp: Date.now()
  }));
}

function closeStream(streamId) {
  const signalingWs = signalingConnections.get(streamId);
  if (!signalingWs) return;

  signalingWs.send(JSON.stringify({
    msg_type: 21, // STREAM_CLOSE_REQ
    rtms_stream_id: streamId
  }));
}

March 2026 Notes

The new PARTICIPANT_VIDEO_ON / PARTICIPANT_VIDEO_OFF events tell you which participants currently have subscribable camera streams.
To receive one participant camera feed, use VIDEO_SINGLE_INDIVIDUAL_STREAM in the video media handshake and then send VIDEO_SUBSCRIPTION_REQ.
RTMS currently supports only one individual participant video stream at a time. A new subscription replaces the previous one.
STREAM_CLOSE_REQ / STREAM_CLOSE_RESP let the backend terminate a stream cleanly.
Exact numeric values:
PARTICIPANT_VIDEO_ON = 8
PARTICIPANT_VIDEO_OFF = 9
STREAM_CLOSE_REQ = 21
STREAM_CLOSE_RESP = 22
VIDEO_SUBSCRIPTION_REQ = 28
VIDEO_SUBSCRIPTION_RESP = 29

Video Parameters

{
  content_type: 3,     // 3=RAW_VIDEO
  codec: 7,            // 5=JPG, 6=PNG, 7=H.264
  resolution: 2,       // 1=SD, 2=HD, 3=FHD, 4=QHD
  fps: 25,             // 1-30 (JPG/PNG max 5)
  data_opt: 3          // 3=Single active speaker
}

Screen Share Parameters

{
  content_type: 3,     // 3=RAW_VIDEO
  codec: 5,            // 5=JPG, 6=PNG, 7=H.264
  resolution: 3,       // 1=SD, 2=HD, 3=FHD, 4=QHD
  fps: 1               // 1-30 (JPG/PNG max 1)
}

Transcript Parameters

{
  content_type: 5,     // 5=TEXT
  src_language: 9,     // 9=English
  enable_lid: false    // Fixed language, no auto-switch
}

Status Codes

Code	Name	Description
0	STATUS_OK	Success
3	STATUS_INVALID_SIGNATURE	Invalid signature
8	STATUS_DUPLICATE_SIGNAL_REQUEST	Duplicate signaling connection
16	STATUS_DUPLICATE_MEDIA_DATA_CONNECTION	Duplicate media connection
40	STATUS_INVALID_RTMS_SESSION_ID	Invalid RTMS session ID
43	STATUS_INVALID_MEDIA_TRANSCRIPT_SROUCE_LANGUAGE	Invalid transcript source language

See Data Types for complete list.

Error Handling

// Implement exponential backoff for reconnection
let retryDelay = 1000;

ws.on('close', (code, reason) => {
  console.log('Connection closed:', code, reason);
  
  // Don't reconnect if intentionally closed
  if (code === 1000) return;
  
  setTimeout(() => {
    reconnect();
  }, retryDelay);
  
  retryDelay = Math.min(retryDelay * 2, 30000);
});

ws.on('error', (error) => {
  console.error('WebSocket error:', error);
  // Connection will close, triggering reconnection
});

Gap-Filled Audio Recording

Fill gaps with silence for continuous playback:

function handleAudioData(msg, streamId) {
  const now = msg.timestamp;
  const last = lastTimestamps.get(streamId) || now;
  const gap = now - last;
  
  // Fill gaps >= 500ms with silence
  if (gap >= 500) {
    const silentFrames = Math.floor(gap / 20);
    console.log(`Filling ${silentFrames} silent frames`);
    
    for (let i = 0; i < silentFrames; i++) {
      const silentFrame = Buffer.alloc(640); // 20ms @ 16kHz mono
      writeToFile(silentFrame);
    }
  }
  
  lastTimestamps.set(streamId, now);
  
  const audioBuffer = Buffer.from(msg.content, 'base64');
  writeToFile(audioBuffer);
}

Next Steps

[SDK Quickstart](sdk-quickstart.md) - SDK handles all this complexity
[AI Integration](ai-integration.md) - Transcription and analysis
[Data Types](../references/data-types.md) - All enums and constants

RTMS Bot (Real-Time Media Streams)

Build resilient RTMS bots that access meeting audio, video, transcription, screen share, and chat without joining as a visible participant.

Overview

RTMS bots are invisible read-only services that subscribe to meeting media streams via WebSockets. They do NOT appear in the participant list.

Use this approach when:

You only need to observe/transcribe (no interaction needed)
You want invisible operation
You're processing external meetings (with permission)
You want minimal resource usage

Alternative: See Meeting SDK Bot (Linux) for visible participant bots with full meeting control.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        RTMS BOT FLOW                                 │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│ 1. Trigger RTMS: REST API or In-Meeting Start                       │
│    └── POST /meetings/{meetingId}/rtms                              │
│    └── Or: Start RTMS manually from Zoom client                     │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│ 2. Wait for Webhook: meeting.rtms_started                           │
│    └── Zoom sends signaling + media WebSocket URLs                  │
│    └── No webhook = RTMS unavailable (no polling fallback)          │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│ 3. Connect: Signaling WebSocket (Handshake with HMAC)               │
│    └── Generate HMAC-SHA256 signature                               │
│    └── Send handshake message                                       │
│    └── Receive session confirmation                                 │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│ 4. Connect: Media WebSocket (Subscribe to Streams)                  │
│    └── Subscribe to: audio, video, transcription, share, chat       │
│    └── Send keep-alive pings                                        │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│ 5. Process Media Data                                               │
│    └── Audio: Opus/PCM streams per speaker                          │
│    └── Video: H.264 encoded frames                                  │
│    └── Transcription: Real-time text with speaker labels            │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│ 6. Mid-Stream: Connection Monitoring                                │
│    └── Detect WebSocket close → Exponential backoff retry           │
│    └── Stop after N reconnection attempts                           │
└─────────────────────────────────────────────────────────────────────┘

Skills Required

Skill	Purpose
zoom-rest-api	Trigger RTMS start (optional - can also start manually)
rtms	WebSocket connection, media processing
webhooks	Receive `meeting.rtms_started` event

Prerequisites

Zoom app with RTMS features enabled
Webhook endpoint (HTTPS, publicly accessible)
Event subscriptions: meeting.rtms_started, meeting.rtms_stopped
Scopes: meeting:read:admin, meeting:write:admin (if triggering via API)
RTMS SDK or native WebSocket implementation

Configuration

Retry Parameters (Customizable)

// config.js or environment variables
const rtmsConfig = {
    // WebSocket connection (initial)
    connection_timeout_ms: 10000,        // Handshake timeout (default: 10s)
    connection_max_attempts: 5,          // Max connection attempts (default: 5)
    connection_retry_delay_ms: 5000,     // Constant retry: 5s (default: 5s)
    
    // Mid-stream reconnection (network failures)
    reconnect_max_attempts: 3,           // Max reconnection attempts (default: 3)
    reconnect_base_delay_ms: 2000,       // Initial delay: 2s (default: 2s)
    // Exponential backoff: 2s, 4s, 8s...
    
    // Keep-alive ping
    keepalive_interval_ms: 5000,         // Send ping every 5s (default: 5s, min: 3s)
    keepalive_timeout_ms: 15000,         // Expect pong within 15s (default: 15s)
    
    // Webhook wait timeout
    webhook_wait_timeout_ms: 300000      // Wait 5min for webhook (default: 5min)
};

// Load from environment variables (recommended for production)
function loadConfig() {
    return {
        connection_timeout_ms: 
            parseInt(process.env.RTMS_CONNECTION_TIMEOUT_MS) || 10000,
        connection_max_attempts: 
            parseInt(process.env.RTMS_CONNECTION_MAX_ATTEMPTS) || 5,
        connection_retry_delay_ms: 
            parseInt(process.env.RTMS_CONNECTION_RETRY_DELAY_MS) || 5000,
        reconnect_max_attempts: 
            parseInt(process.env.RTMS_RECONNECT_MAX_ATTEMPTS) || 3,
        reconnect_base_delay_ms: 
            parseInt(process.env.RTMS_RECONNECT_BASE_DELAY_MS) || 2000,
        keepalive_interval_ms: 
            Math.max(parseInt(process.env.RTMS_KEEPALIVE_INTERVAL_MS) || 5000, 3000),
        keepalive_timeout_ms: 
            parseInt(process.env.RTMS_KEEPALIVE_TIMEOUT_MS) || 15000,
        webhook_wait_timeout_ms: 
            parseInt(process.env.RTMS_WEBHOOK_WAIT_TIMEOUT_MS) || 300000
    };
}

Customization Guide

Parameter	Default	When to Increase	When to Decrease
`connection_max_attempts`	5	Slow/congested networks	Fast failure detection needed
`connection_retry_delay_ms`	5000 (5s)	High network latency	Local network, low latency
`reconnect_max_attempts`	3	Critical meetings, unstable network	Cost-sensitive, batch processing
`reconnect_base_delay_ms`	2000 (2s)	International connections	Local network
`keepalive_interval_ms`	5000 (5s)	Aggressive connection monitoring	Reduce bandwidth overhead
`webhook_wait_timeout_ms`	300000 (5min)	Meetings may start late	Fast failure detection

Recommended Ranges:

Connection attempts: 3-10
Connection retry delay: 2s-15s
Reconnect attempts: 2-5
Reconnect base delay: 1s-5s
Keep-alive interval: 3s-30s (min: 3s per Zoom docs)

Examples:

# High-priority production bot (aggressive)
export RTMS_CONNECTION_MAX_ATTEMPTS=10
export RTMS_CONNECTION_RETRY_DELAY_MS=3000      # 3s
export RTMS_RECONNECT_MAX_ATTEMPTS=5
export RTMS_RECONNECT_BASE_DELAY_MS=1000        # 1s
export RTMS_KEEPALIVE_INTERVAL_MS=3000          # 3s (minimum)

# Cost-sensitive batch processing (conservative)
export RTMS_CONNECTION_MAX_ATTEMPTS=3
export RTMS_CONNECTION_RETRY_DELAY_MS=10000     # 10s
export RTMS_RECONNECT_MAX_ATTEMPTS=2
export RTMS_RECONNECT_BASE_DELAY_MS=5000        # 5s
export RTMS_KEEPALIVE_INTERVAL_MS=15000         # 15s

# Development/testing (fail fast)
export RTMS_CONNECTION_MAX_ATTEMPTS=2
export RTMS_CONNECTION_RETRY_DELAY_MS=2000      # 2s
export RTMS_RECONNECT_MAX_ATTEMPTS=1
export RTMS_RECONNECT_BASE_DELAY_MS=1000        # 1s
export RTMS_WEBHOOK_WAIT_TIMEOUT_MS=60000       # 1min

Step 1: Trigger RTMS (Optional - REST API)

You can start RTMS programmatically or manually from the Zoom client.

Option A: REST API Trigger

# Start RTMS for a meeting
curl -X POST "https://api.zoom.us/v2/meetings/{meetingId}/rtms" \
  -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "meeting"
  }'

Response:

{
  "rtms_id": "abc123def456",
  "status": "starting"
}

Option B: Manual Start (In-Meeting)

Host clicks Apps → Your RTMS App → Start RTMS

Both trigger the same webhook → meeting.rtms_started

Step 2: Wait for Webhook (Required)

CRITICAL: RTMS requires a webhook. There is NO polling alternative. If webhook doesn't arrive, RTMS is unavailable.

Webhook Handler

const express = require('express');
const crypto = require('crypto');
const app = express();

const config = loadConfig();
const pendingConnections = new Map();  // Track webhook waiters

app.post('/webhook', express.json(), async (req, res) => {
    // 1. Verify webhook signature
    const signature = req.headers['x-zm-signature'];
    const timestamp = req.headers['x-zm-request-timestamp'];
    
    if (!verifyWebhookSignature(req.body, signature, timestamp)) {
        return res.status(403).send('Invalid signature');
    }
    
    // 2. Respond immediately (Zoom expects 200 within 3s)
    res.status(200).send();
    
    // 3. Process webhook asynchronously
    const event = req.body;
    
    if (event.event === 'meeting.rtms_started') {
        console.log('[WEBHOOK] RTMS started for meeting:', event.payload.object.uuid);
        
        const rtmsInfo = {
            meetingUuid: event.payload.object.uuid,
            signalingUrl: event.payload.object.signaling_url,
            mediaUrl: event.payload.object.media_url,
            sessionKey: event.payload.object.session_key
        };
        
        // Notify waiting connection
        const waiter = pendingConnections.get(rtmsInfo.meetingUuid);
        if (waiter) {
            waiter.resolve(rtmsInfo);
            pendingConnections.delete(rtmsInfo.meetingUuid);
        } else {
            // No waiter - proactive start
            connectToRTMS(rtmsInfo);
        }
    }
});

function verifyWebhookSignature(body, signature, timestamp) {
    const message = `v0:${timestamp}:${JSON.stringify(body)}`;
    const hmac = crypto.createHmac('sha256', WEBHOOK_SECRET_TOKEN);
    const computed = 'v0=' + hmac.update(message).digest('hex');
    return crypto.timingSafeEqual(
        Buffer.from(signature),
        Buffer.from(computed)
    );
}

Wait for Webhook (with Timeout)

async function waitForRTMSWebhook(meetingUuid) {
    return new Promise((resolve, reject) => {
        const timeoutId = setTimeout(() => {
            pendingConnections.delete(meetingUuid);
            reject(new Error(
                `RTMS webhook not received within ${config.webhook_wait_timeout_ms}ms. ` +
                `Possible causes: ` +
                `(1) Meeting hasn't started, ` +
                `(2) RTMS not enabled for this meeting, ` +
                `(3) Webhook endpoint unreachable.`
            ));
        }, config.webhook_wait_timeout_ms);
        
        pendingConnections.set(meetingUuid, {
            resolve: (rtmsInfo) => {
                clearTimeout(timeoutId);
                resolve(rtmsInfo);
            },
            reject
        });
    });
}

// Usage
try {
    console.log('[RTMS] Waiting for webhook...');
    const rtmsInfo = await waitForRTMSWebhook(MEETING_UUID);
    console.log('[RTMS] Webhook received, connecting...');
    await connectToRTMS(rtmsInfo);
} catch (error) {
    console.error('[RTMS] ABORT:', error.message);
}

Error if no webhook: ABORT. No webhook = RTMS unavailable. No polling alternative.

Step 3: Connect to Signaling WebSocket

Connection with Retry

const WebSocket = require('ws');

async function connectSignalingWithRetry(signalingUrl, sessionKey) {
    for (let attempt = 1; attempt <= config.connection_max_attempts; attempt++) {
        console.log(`[SIGNALING] Attempt ${attempt}/${config.connection_max_attempts}`);
        
        try {
            const ws = await connectSignalingSocket(signalingUrl, sessionKey);
            console.log('[SIGNALING] Connected successfully');
            return ws;
        } catch (error) {
            console.error(`[SIGNALING] Attempt ${attempt} failed:`, error.message);
            
            if (attempt < config.connection_max_attempts) {
                const delay = config.connection_retry_delay_ms;
                console.log(`[SIGNALING] Retrying in ${delay}ms...`);
                await sleep(delay);
            }
        }
    }
    
    throw new Error(
        `Failed to connect signaling WebSocket after ${config.connection_max_attempts} attempts`
    );
}

function connectSignalingSocket(signalingUrl, sessionKey) {
    return new Promise((resolve, reject) => {
        const ws = new WebSocket(signalingUrl);
        const timeoutId = setTimeout(() => {
            ws.close();
            reject(new Error('Signaling connection timeout'));
        }, config.connection_timeout_ms);
        
        ws.on('open', () => {
            console.log('[SIGNALING] WebSocket opened, sending handshake...');
            
            // Generate HMAC signature
            const timestamp = Date.now();
            const message = `${timestamp}:${sessionKey}`;
            const signature = crypto
                .createHmac('sha256', WEBHOOK_SECRET_TOKEN)
                .update(message)
                .digest('hex');
            
            // Send handshake
            ws.send(JSON.stringify({
                type: 'handshake',
                timestamp,
                signature
            }));
        });
        
        ws.on('message', (data) => {
            const msg = JSON.parse(data);
            
            if (msg.type === 'handshake_response') {
                clearTimeout(timeoutId);
                
                if (msg.status === 'success') {
                    console.log('[SIGNALING] Handshake successful');
                    resolve(ws);
                } else {
                    ws.close();
                    reject(new Error(`Handshake failed: ${msg.error}`));
                }
            }
        });
        
        ws.on('error', (error) => {
            clearTimeout(timeoutId);
            reject(error);
        });
        
        ws.on('close', (code, reason) => {
            clearTimeout(timeoutId);
            reject(new Error(`Connection closed: ${code} ${reason}`));
        });
    });
}

Step 4: Connect to Media WebSocket

async function connectMediaWithRetry(mediaUrl, signalingWs) {
    for (let attempt = 1; attempt <= config.connection_max_attempts; attempt++) {
        console.log(`[MEDIA] Attempt ${attempt}/${config.connection_max_attempts}`);
        
        try {
            const ws = await connectMediaSocket(mediaUrl);
            console.log('[MEDIA] Connected successfully');
            subscribeToStreams(ws);
            setupKeepAlive(ws);
            return ws;
        } catch (error) {
            console.error(`[MEDIA] Attempt ${attempt} failed:`, error.message);
            
            if (attempt < config.connection_max_attempts) {
                const delay = config.connection_retry_delay_ms;
                console.log(`[MEDIA] Retrying in ${delay}ms...`);
                await sleep(delay);
            }
        }
    }
    
    throw new Error(
        `Failed to connect media WebSocket after ${config.connection_max_attempts} attempts`
    );
}

function subscribeToStreams(mediaWs) {
    // Subscribe to all available streams
    mediaWs.send(JSON.stringify({
        type: 'subscribe',
        streams: ['audio', 'video', 'transcription', 'share', 'chat']
    }));
    
    console.log('[MEDIA] Subscribed to: audio, video, transcription, share, chat');
}

Step 5: Keep-Alive Management

function setupKeepAlive(ws) {
    let lastPongReceived = Date.now();
    let keepAliveInterval;
    let timeoutCheck;
    
    // Send ping periodically
    keepAliveInterval = setInterval(() => {
        if (ws.readyState === WebSocket.OPEN) {
            ws.ping();
            console.log('[KEEPALIVE] Ping sent');
        }
    }, config.keepalive_interval_ms);
    
    // Check for pong timeout
    timeoutCheck = setInterval(() => {
        const timeSinceLastPong = Date.now() - lastPongReceived;
        
        if (timeSinceLastPong > config.keepalive_timeout_ms) {
            console.error('[KEEPALIVE] Pong timeout, closing connection');
            clearInterval(keepAliveInterval);
            clearInterval(timeoutCheck);
            ws.close(1000, 'Keep-alive timeout');
        }
    }, 1000);
    
    ws.on('pong', () => {
        lastPongReceived = Date.now();
        console.log('[KEEPALIVE] Pong received');
    });
    
    ws.on('close', () => {
        clearInterval(keepAliveInterval);
        clearInterval(timeoutCheck);
    });
}

Step 6: Mid-Stream Reconnection

class ResilientRTMSConnection {
    constructor(rtmsInfo, config) {
        this.rtmsInfo = rtmsInfo;
        this.config = config;
        this.reconnectionAttempt = 0;
        this.signalingWs = null;
        this.mediaWs = null;
    }
    
    async connect() {
        try {
            this.signalingWs = await connectSignalingWithRetry(
                this.rtmsInfo.signalingUrl,
                this.rtmsInfo.sessionKey
            );
            
            this.mediaWs = await connectMediaWithRetry(
                this.rtmsInfo.mediaUrl,
                this.signalingWs
            );
            
            this.setupReconnectionHandlers();
            
        } catch (error) {
            console.error('[RTMS] Initial connection failed:', error);
            throw error;
        }
    }
    
    setupReconnectionHandlers() {
        const handleDisconnection = (wsType) => async (code, reason) => {
            console.error(`[${wsType}] Disconnected: ${code} ${reason}`);
            
            this.reconnectionAttempt++;
            
            if (this.reconnectionAttempt > this.config.reconnect_max_attempts) {
                console.error(
                    `[RECONNECT] Giving up after ${this.reconnectionAttempt} attempts`
                );
                this.cleanup();
                return;
            }
            
            // Exponential backoff: 2s, 4s, 8s...
            const delay = this.config.reconnect_base_delay_ms 
                          * Math.pow(2, this.reconnectionAttempt - 1);
            
            console.log(
                `[RECONNECT] Attempt ${this.reconnectionAttempt}/` +
                `${this.config.reconnect_max_attempts} in ${delay}ms...`
            );
            
            await sleep(delay);
            
            try {
                await this.connect();
                console.log('[RECONNECT] Successfully reconnected');
                this.reconnectionAttempt = 0;  // Reset counter
            } catch (error) {
                console.error('[RECONNECT] Failed:', error.message);
                // Handler will be called again if connection fails
            }
        };
        
        this.signalingWs.on('close', handleDisconnection('SIGNALING'));
        this.mediaWs.on('close', handleDisconnection('MEDIA'));
        
        this.signalingWs.on('error', (error) => {
            console.error('[SIGNALING] Error:', error.message);
        });
        
        this.mediaWs.on('error', (error) => {
            console.error('[MEDIA] Error:', error.message);
        });
    }
    
    cleanup() {
        if (this.signalingWs) this.signalingWs.close();
        if (this.mediaWs) this.mediaWs.close();
    }
}

Customizing Reconnection Behavior

// Example: Capped exponential backoff (max 30s)
const delay = Math.min(
    config.reconnect_base_delay_ms * Math.pow(2, reconnectionAttempt - 1),
    30000  // Cap at 30s
);

// Example: Linear backoff instead of exponential
const delay = config.reconnect_base_delay_ms * reconnectionAttempt;

// Example: Jittered backoff (avoid thundering herd)
const baseDelay = config.reconnect_base_delay_ms * Math.pow(2, reconnectionAttempt - 1);
const jitter = Math.random() * 1000;  // Random 0-1000ms
const delay = baseDelay + jitter;

Complete Resilient Bot Example

const config = loadConfig();

async function main() {
    try {
        // 1. Optional: Trigger RTMS via REST API
        console.log('[RTMS] Triggering RTMS start...');
        await triggerRTMSStart(MEETING_ID);
        
        // 2. Wait for webhook
        console.log('[RTMS] Waiting for meeting.rtms_started webhook...');
        const rtmsInfo = await waitForRTMSWebhook(MEETING_UUID);
        
        // 3. Connect with resilience
        const rtms = new ResilientRTMSConnection(rtmsInfo, config);
        await rtms.connect();
        
        console.log('[RTMS] Bot is running, processing streams...');
        
        // 4. Process media data
        rtms.mediaWs.on('message', (data) => {
            const frame = parseMediaFrame(data);
            processMediaFrame(frame);
        });
        
        // 5. Handle graceful shutdown
        process.on('SIGINT', () => {
            console.log('[RTMS] Shutting down...');
            rtms.cleanup();
            process.exit(0);
        });
        
    } catch (error) {
        console.error('[RTMS] ABORT:', error.message);
        process.exit(1);
    }
}

main();

Comparison: RTMS Bot vs Meeting SDK Bot

Aspect	RTMS Bot	Meeting SDK Bot
Visibility	Invisible (read-only service)	Visible participant
Authentication	REST API trigger + webhook	JWT + OBF token
Join Dependency	No dependency on participants	Owner must be present
Retry Logic	Not applicable (webhook-based)	Required (owner presence)
Media Access	Audio/video/text/share/chat via WebSocket	Raw audio/video/share via SDK
Recording Control	None (read-only)	Full (local, cloud, raw)
Interaction	Cannot interact	Can send chat, reactions
Resource Usage	Lower (WebSocket only)	Higher (full SDK)
Use Case	Passive transcription, analytics	Interactive bots, recording, moderation

Choose RTMS Bot when:

You only need to observe/transcribe
You want minimal resource usage
You prefer invisible operation
You're processing external meetings (with permission)

Choose Meeting SDK Bot when:

You need to interact with the meeting (chat, reactions)
You need local recording control
You want to be visible in participant list
You're processing your own meetings

Troubleshooting

Webhook Never Arrives

Symptom: waitForRTMSWebhook() times out

Solution: 1. Verify webhook endpoint is HTTPS and publicly accessible 2. Check Event Subscriptions in Zoom Marketplace: meeting.rtms_started enabled 3. Verify RTMS was actually started (check Zoom client or REST API response) 4. Increase webhook_wait_timeout_ms if meeting starts later than expected 5. Test webhook delivery: curl -X POST YOUR_WEBHOOK_URL

Signaling Handshake Fails

Symptom: Connection closes immediately after handshake

Solution: 1. Verify HMAC signature generation matches Zoom docs 2. Check timestamp is current (not stale) 3. Verify WEBHOOK_SECRET_TOKEN matches Zoom Marketplace config 4. Check signaling URL hasn't expired (short TTL)

Keep-Alive Timeout

Symptom: Connection closes with "Keep-alive timeout"

Solution: 1. Network congestion - increase keepalive_timeout_ms 2. Server overloaded - increase keepalive_interval_ms 3. Verify ping/pong implementation is correct 4. Check firewall/proxy not blocking WebSocket pings

Frequent Reconnections

Symptom: Bot reconnects multiple times, then gives up

Solution: 1. Increase reconnect_max_attempts (e.g., 5 instead of 3) 2. Increase reconnect_base_delay_ms if network is slow 3. Monitor server resources (CPU/memory/network) 4. Check for rate limiting (too many connection attempts)

Resources

RTMS Docs: https://developers.zoom.us/docs/rtms/
RTMS WebSocket Guide: https://developers.zoom.us/docs/api/websockets/
RTMS SDK: https://github.com/zoom/rtms
Webhook Reference: ../references/webhooks.md
Connection Architecture: ../concepts/connection-architecture.md
Meeting SDK Bot Alternative: Meeting SDK Bot (Linux)

SDK Quickstart

The fastest way to receive RTMS media using the official @zoom/rtms SDK.

Installation

# Requires Node.js 20.3.0+ (24 LTS recommended)
npm install @zoom/rtms express

Environment Setup

# .env
ZM_RTMS_CLIENT=your_client_id
ZM_RTMS_SECRET=your_client_secret

Multi-Product Support

The SDK accepts both meeting_uuid (meetings/webinars) and session_id (Video SDK) via client.join(payload) transparently. You only need to handle the different webhook event names -- the rest of the protocol is identical.

// These constants cover all RTMS products
const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const RTMS_STOP_EVENTS = ["meeting.rtms_stopped", "webinar.rtms_stopped", "session.rtms_stopped"];

Minimal Example

import rtms from "@zoom/rtms";

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

// Handle webhook events - SDK starts webhook server automatically
rtms.onWebhookEvent(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;

  const client = new rtms.Client();

  client.onTranscriptData((data, timestamp, metadata) => {
    const text = data.toString('utf8');
    console.log(`${metadata.userName}: ${text}`);
  });

  // SDK handles all WebSocket complexity
  // Accepts both meeting_uuid and session_id transparently
  client.join(payload);
});

Complete Example with All Media Types

import rtms from "@zoom/rtms";
import fs from 'fs';

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];
const RTMS_STOP_EVENTS = ["meeting.rtms_stopped", "webinar.rtms_stopped", "session.rtms_stopped"];

const clients = new Map();

rtms.onWebhookEvent(({ event, payload }) => {
  const streamId = payload?.rtms_stream_id;

  // Handle session end (meetings, webinars, and Video SDK)
  if (RTMS_STOP_EVENTS.includes(event)) {
    const client = clients.get(streamId);
    if (client) {
      client.leave();
      clients.delete(streamId);
    }
    return;
  }

  if (!RTMS_EVENTS.includes(event)) return;

  // Prevent duplicate connections
  if (clients.has(streamId)) {
    console.log('Already connected to this stream');
    return;
  }

  const client = new rtms.Client();
  clients.set(streamId, client);

  // Join confirmation
  client.onJoinConfirm((reason) => {
    console.log(`Joined meeting: ${reason}`);
  });

  // Audio data
  client.onAudioData((buffer, timestamp, metadata) => {
    console.log(`Audio from ${metadata.userName}: ${buffer.length} bytes`);
    // Save to file, send to transcription service, etc.
  });

  // Video data
  client.onVideoData((buffer, timestamp, trackId, metadata) => {
    console.log(`Video from ${metadata.userName}: ${buffer.length} bytes`);
    // H.264 NAL units or JPG/PNG frames
  });

  // Transcript (real-time speech-to-text from Zoom)
  client.onTranscriptData((buffer, timestamp, metadata) => {
    const text = buffer.toString('utf8');
    console.log(`[${metadata.userName}]: ${text}`);
  });

  // Chat messages
  client.onChatData((buffer, timestamp, metadata) => {
    const text = buffer.toString('utf8');
    console.log(`[Chat] ${metadata.userName}: ${text}`);
  });

  // Screen share
  client.onShareData((buffer, timestamp, metadata) => {
    console.log(`Screen share from ${metadata.userName}: ${buffer.length} bytes`);
  });

  // Participant events
  client.onParticipantEvent((event, timestamp, participants) => {
    participants.forEach(p => {
      console.log(`Participant ${event}: ${p.userName}`);
    });
  });

  // Active speaker changed
  client.onActiveSpeakerEvent((timestamp, userId, userName) => {
    console.log(`Active speaker: ${userName}`);
  });

  // Screen sharing started/stopped
  client.onSharingEvent((event, timestamp, userId, userName) => {
    console.log(`Sharing ${event}: ${userName}`);
  });

  // Session ended
  client.onLeave((reason) => {
    console.log(`Left meeting: ${reason}`);
    clients.delete(streamId);
  });

  // Join the meeting
  client.join(payload);
});

Configuring Audio Parameters

import rtms from "@zoom/rtms";

const client = new rtms.Client();

// Set audio parameters before joining
client.setAudioParams({
  contentType: 2,    // RAW_AUDIO
  codec: 4,          // OPUS (default)
  sampleRate: 3,     // 48kHz
  channel: 2,        // Stereo (only with OPUS)
  dataOpt: 2,        // AUDIO_MULTI_STREAMS (per-participant)
  duration: 20,      // 20ms chunks
  frameSize: 960     // Samples per frame
});

client.join(payload);

Audio Parameter Options

Parameter	Options
`contentType`	1=RTP, 2=RAW_AUDIO
`codec`	1=L16 (PCM), 2=G.711, 3=G.722, 4=OPUS
`sampleRate`	0=8kHz, 1=16kHz, 2=32kHz, 3=48kHz
`channel`	1=Mono, 2=Stereo (OPUS only!)
`dataOpt`	1=Mixed stream, 2=Multi-streams (per participant)
`duration`	Chunk size in ms (multiple of 20, max 1000)

Configuring Video Parameters

client.setVideoParams({
  contentType: 3,    // RAW_VIDEO
  codec: 7,          // H.264
  resolution: 2,     // HD (720p)
  fps: 25,
  dataOpt: 3         // Single active speaker
});

Video Parameter Options

Parameter	Options
`codec`	5=JPG, 6=PNG, 7=H.264
`resolution`	1=SD (480p), 2=HD (720p), 3=FHD (1080p), 4=QHD (1440p)
`fps`	1-30 (JPG/PNG max 5, H.264 max 30)
`dataOpt`	3=Single active speaker

With Express Webhook Handler

import rtms from "@zoom/rtms";
import express from "express";

const app = express();
app.use(express.json());

const RTMS_EVENTS = ["meeting.rtms_started", "webinar.rtms_started", "session.rtms_started"];

// Use SDK's webhook handler
app.post('/webhook', rtms.createWebhookHandler(({ event, payload }) => {
  if (!RTMS_EVENTS.includes(event)) return;
  
  const client = new rtms.Client();
  
  client.onTranscriptData((data, timestamp, metadata) => {
    console.log(`${metadata.userName}: ${data.toString('utf8')}`);
  });
  
  client.join(payload);
}, '/webhook'));

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

Class-Based Approach (Multiple Connections)

For applications needing multiple concurrent connections:

import rtms from "@zoom/rtms";

// Initialize SDK once
rtms.Client.initialize();

// Create multiple clients
const client1 = new rtms.Client();
const client2 = new rtms.Client();

client1.onTranscriptData((data, ts, meta) => {
  console.log(`[Meeting 1] ${meta.userName}: ${data.toString('utf8')}`);
});

client2.onTranscriptData((data, ts, meta) => {
  console.log(`[Meeting 2] ${meta.userName}: ${data.toString('utf8')}`);
});

// Join different meetings
client1.join(meeting1Payload);
client2.join(meeting2Payload);

Error Handling

client.onJoinConfirm((reason) => {
  if (reason !== 0) {
    console.error(`Join failed with reason: ${reason}`);
    // Handle error
  }
});

client.onLeave((reason) => {
  console.log(`Left meeting with reason: ${reason}`);
  
  // Cleanup
  clients.delete(streamId);
  
  // Optionally reconnect
  if (reason === /* unexpected disconnect */) {
    setTimeout(() => reconnect(), 2000);
  }
});

Python SDK

import rtms
from dotenv import load_dotenv

load_dotenv()

RTMS_EVENTS = ['meeting.rtms_started', 'webinar.rtms_started', 'session.rtms_started']
RTMS_STOP_EVENTS = ['meeting.rtms_stopped', 'webinar.rtms_stopped', 'session.rtms_stopped']

clients = {}

@rtms.onWebhookEvent
def handle_webhook(webhook):
    event = webhook.get('event')
    payload = webhook.get('payload', {})
    stream_id = payload.get('rtms_stream_id')

    if event in RTMS_STOP_EVENTS:
        if stream_id in clients:
            clients[stream_id].leave()
            del clients[stream_id]
        return

    if event not in RTMS_EVENTS:
        return

    client = rtms.Client()
    clients[stream_id] = client

    @client.onTranscriptData
    def on_transcript(data, size, timestamp, metadata):
        text = data.decode('utf-8')
        print(f'[{metadata.userName}]: {text}')

    @client.onJoinConfirm
    def on_join(reason):
        print(f'Joined: {reason}')

    @client.onLeave
    def on_leave(reason):
        print(f'Left: {reason}')

    # SDK accepts both meeting_uuid and session_id transparently
    client.join(payload)

# Main loop
if __name__ == '__main__':
    print('Webhook server running...')
    rtms.run()

Environment Variables Reference

# Required
ZM_RTMS_CLIENT=your_client_id
ZM_RTMS_SECRET=your_client_secret

# Optional
ZM_RTMS_PORT=8080           # Webhook server port
ZM_RTMS_PATH=/webhook       # Webhook endpoint path

# Logging
ZM_RTMS_LOG_LEVEL=info      # error, warn, info, debug, trace
ZM_RTMS_LOG_FORMAT=progressive  # progressive or json
ZM_RTMS_LOG_ENABLED=true

Common Issues

Issue	Solution
Segmentation fault	Upgrade to Node.js 20.3.0+ (24 LTS recommended)
Audio metadata missing userId	Use `onActiveSpeakerEvent` for speaker identification with mixed stream
Video params ignored	Call `setVideoParams` BEFORE `setAudioParams`

Next Steps

[Manual WebSocket](manual-websocket.md) - Full protocol control without SDK
[AI Integration](ai-integration.md) - Transcription and analysis patterns
[Media Types](../references/media-types.md) - All configuration options

RTMS - Connection

WebSocket connection protocol details.

Connection Flow

1. Receive meeting/webinar/session.rtms_started webhook
           ↓
2. Extract server_urls, stream_id, and meeting_uuid or session_id
           ↓
3. Generate signature (HMAC-SHA256) using meeting_uuid or session_id
           ↓
4. Connect to signaling WebSocket
           ↓
5. Send handshake request (msg_type 1)
           ↓
6. Receive handshake response (msg_type 2) with media server URL
           ↓
7. Connect to media WebSocket(s)
           ↓
8. Send media handshake (msg_type 3)
           ↓
9. Receive media handshake response (msg_type 4)
           ↓
10. Send ready to receive (msg_type 7)
           ↓
11. Receive media data (msg_type 14-18)
           ↓
12. Respond to heartbeats (msg_type 12 → 13)
           ↓
13. Optionally react to `PARTICIPANT_VIDEO_ON/OFF`, send `VIDEO_SUBSCRIPTION_REQ`, or gracefully terminate with `STREAM_CLOSE_REQ`

Signature Generation

const crypto = require('crypto');

// For meetings and webinars: use meeting_uuid
// For Video SDK: use session_id
// Webinars still use meeting_uuid (NOT webinar_uuid)
function generateSignature(clientId, idValue, streamId, clientSecret) {
  const message = `${clientId},${idValue},${streamId}`;
  return crypto.createHmac('sha256', clientSecret).update(message).digest('hex');
}

// Extract the correct ID from any product's webhook payload
const idValue = payload.meeting_uuid || payload.session_id;

Signaling Message Types

msg_type	Name	Direction	Description
1	Handshake Request	Client → Server	Initiate connection
2	Handshake Response	Server → Client	Returns media server URL
3	Media Handshake Request	Client → Server	Request specific media types
4	Media Handshake Response	Server → Client	Confirms media subscription
7	Ready to Receive	Client → Server	Signal ready for data
12	Keep Alive Request	Server → Client	Heartbeat ping
13	Keep Alive Response	Client → Server	Heartbeat pong

Media Message Types

msg_type	Media Type
14	Audio
15	Video
16	Screen Share
17	Transcript
18	Chat

Critical Gotchas

1. Only ONE Connection Per Stream!

// WRONG - Connecting twice kicks out first connection
connectToRTMS(serverUrl, streamId);  // Connection 1
connectToRTMS(serverUrl, streamId);  // Connection 2 - kicks out Connection 1!

// CORRECT - Only connect once
if (!activeConnections.has(streamId)) {
  connectToRTMS(serverUrl, streamId);
  activeConnections.set(streamId, ws);
}

2. Heartbeat is MANDATORY

When you receive msg_type 12, you MUST respond with msg_type 13:

ws.on('message', (data) => {
  const msg = JSON.parse(data);
  
  if (msg.msg_type === 12) {  // Keep Alive Request
    ws.send(JSON.stringify({ 
      msg_type: 13,  // Keep Alive Response
      timestamp: msg.timestamp 
    }));
  }
});

3. Reconnection is YOUR Responsibility

RTMS does NOT auto-reconnect. Implement your own retry logic:

Server Type	Timeout
Media Server	65 seconds keep-alive tolerance before timeout
Signaling Server	60 seconds to reconnect

ws.on('close', () => {
  // Implement exponential backoff
  setTimeout(() => reconnect(), retryDelay);
  retryDelay = Math.min(retryDelay * 2, 30000);
});

Transcript LID Control

The transcript media handshake now supports explicit Language Identification control.

mediaWs.send(JSON.stringify({
  msg_type: 3,
  protocol_version: 1,
  meeting_uuid: idValue,
  rtms_stream_id: streamId,
  signature,
  media_type: 8, // TRANSCRIPT
  media_params: {
    transcript: {
      content_type: 5,   // TEXT
      src_language: 9,   // English
      enable_lid: false  // Lock to src_language instead of auto-switching
    }
  }
}));

Use enable_lid: false when:

the meeting should stay on a known language
language-switching is undesirable
you want more predictable downstream transcript processing

Single Individual Video Subscription Flow

RTMS now supports subscribing to one participant camera stream at a time.

1. Open a video media socket with data_opt = VIDEO_SINGLE_INDIVIDUAL_STREAM 2. Subscribe to PARTICIPANT_VIDEO_ON and PARTICIPANT_VIDEO_OFF 3. When an event arrives, choose the user_id you want 4. Send VIDEO_SUBSCRIPTION_REQ on the signaling socket 5. Wait for VIDEO_SUBSCRIPTION_RESP 6. Expect the newest successful subscription to replace the previous participant stream

// Signaling socket: subscribe to control-plane events
signalingWs.send(JSON.stringify({
  msg_type: 5, // EVENT_SUBSCRIPTION
  events: [
    { event_type: 8, subscribe: true }, // PARTICIPANT_VIDEO_ON
    { event_type: 9, subscribe: true }  // PARTICIPANT_VIDEO_OFF
  ]
}));

// Signaling socket: select a participant stream
signalingWs.send(JSON.stringify({
  msg_type: 28, // VIDEO_SUBSCRIPTION_REQ
  user_id: selectedUserId,
  subscribe: true,
  timestamp: Date.now()
}));

The March 2026 changelog did not publish the numeric values for the new message types. Use the protocol definitions before hard-coding them.

Graceful Stream Closure

The backend can now request clean shutdown over the signaling socket:

signalingWs.send(JSON.stringify({
  msg_type: 21, // STREAM_CLOSE_REQ
  rtms_stream_id: streamId
}));

Expect:

STREAM_CLOSE_RESP
then normal connection shutdown / cleanup

Use this when your app wants deterministic teardown instead of waiting for a stop webhook or socket failure.

Split vs Unified Mode

Mode	Description	Best For
Split	One connection per media type	Most use cases. Media server supports multiple connections with different media types
Unified	One connection for all media	Real-time audio+video streaming/muxing where sync matters

Low-Level Connection Example

const WebSocket = require('ws');
const crypto = require('crypto');

async function connectRTMS(webhookPayload) {
  const { server_urls, rtms_stream_id } = webhookPayload;
  // meeting_uuid for meetings/webinars, session_id for Video SDK
  const idValue = webhookPayload.meeting_uuid || webhookPayload.session_id;
  
  // Generate signature
  const signature = crypto
    .createHmac('sha256', process.env.ZOOM_CLIENT_SECRET)
    .update(`${process.env.ZOOM_CLIENT_ID},${idValue},${rtms_stream_id}`)
    .digest('hex');
  
  // Connect to signaling server
  const signalingWs = new WebSocket(server_urls, {
    headers: {
      'X-Zoom-RTMS-Stream-Id': rtms_stream_id,
      'X-Zoom-RTMS-Signature': signature
    }
  });
  
  signalingWs.on('open', () => {
    // Send handshake request
    signalingWs.send(JSON.stringify({
      msg_type: 1,
      protocol_version: 1,
      client_id: process.env.ZOOM_CLIENT_ID,
      meeting_uuid: idValue,          // Works for both meeting_uuid and session_id
      stream_id: rtms_stream_id,
      signature: signature,
      media_type: 9  // AUDIO(1) | TRANSCRIPT(8)
    }));
  });
  
  signalingWs.on('message', (data) => {
    const msg = JSON.parse(data);
    
    switch (msg.msg_type) {
      case 2:  // Handshake response
        // Connect to media server from msg.media_server_url
        connectMediaServer(msg.media_server_url);
        break;
      case 12:  // Keep alive request
        signalingWs.send(JSON.stringify({ msg_type: 13, timestamp: msg.timestamp }));
        break;
    }
  });
  
  signalingWs.on('error', (error) => {
    console.error('Signaling error:', error);
  });
  
  signalingWs.on('close', (code, reason) => {
    console.log('Signaling closed:', code, reason);
    // Implement reconnection logic
  });
}

Resources

RTMS_CONNECTION_FLOW.md: https://github.com/zoom/rtms-samples/blob/main/RTMS_CONNECTION_FLOW.md
ARCHITECTURE.md: https://github.com/zoom/rtms-samples/blob/main/ARCHITECTURE.md
TROUBLESHOOTING.md: https://github.com/zoom/rtms-samples/blob/main/TROUBLESHOOTING.md

Zoom RTMS Environment Variables

Standard `.env` keys

Variable	Required	Used for	Where to find
`ZOOM_CLIENT_ID`	OAuth mode	RTMS subscription/auth (Meetings/Webinars mode)	Zoom Marketplace -> OAuth app credentials
`ZOOM_CLIENT_SECRET`	OAuth mode	RTMS subscription/auth (Meetings/Webinars mode)	Zoom Marketplace -> OAuth app credentials
`ZOOM_ACCOUNT_ID`	S2S OAuth mode	Account-level RTMS token grants	Zoom Marketplace -> Server-to-Server OAuth app credentials
`ZOOM_VIDEO_SDK_KEY`	Video SDK RTMS mode	RTMS with Video SDK sessions	Zoom Marketplace -> Video SDK app credentials
`ZOOM_VIDEO_SDK_SECRET`	Video SDK RTMS mode	Video SDK session auth/signing	Zoom Marketplace -> Video SDK app credentials
`ZOOM_SECRET_TOKEN` or `WEBHOOK_SECRET_TOKEN`	Yes when validating events	Event signature verification	Zoom Marketplace -> Event Subscriptions -> Secret Token

Connection tuning (optional)

RTMS_CONNECTION_TIMEOUT_MS
RTMS_CONNECTION_MAX_ATTEMPTS
RTMS_CONNECTION_RETRY_DELAY_MS
RTMS_RECONNECT_MAX_ATTEMPTS
RTMS_RECONNECT_BASE_DELAY_MS
RTMS_KEEPALIVE_INTERVAL_MS
RTMS_KEEPALIVE_TIMEOUT_MS

Notes

Choose one credential mode per deployment: OAuth or Video SDK credentials.

RTMS - Media Types

Audio, video, transcript, chat, and screen share data formats.

Media Type Bitmask

Use bitwise OR to combine types:

Type	Value	Event Name	Description
Audio	1	`audio`	PCM audio samples
Video	2	`video`	H.264 encoded frames
Screen Share	4	`sharescreen`	Separate from video!
Transcript	8	`transcript`	Real-time speech-to-text
Chat	16	`chat`	In-meeting chat messages
All	32	all events	All media types

Example: Audio + Transcript = 1 | 8 = 9

const mediaTypes = RTMSManager.MEDIA.AUDIO | RTMSManager.MEDIA.TRANSCRIPT;  // 9

Audio

Property	Options
Sample Rate	8kHz (0), 16kHz (1), 32kHz (2), 48kHz (3)
Codec	L16/PCM (1), G.711 (2), G.722 (3), Opus (4)
Channels	Mono (1), Stereo (2)
Data Option	Mixed (1), Multi-stream (2)
Send Rate	20ms (recommended)

Important: Stereo is ONLY supported with Opus codec!

Audio Configuration Example

const audioParams = {
  content_type: 1,  // MEDIA_CONTENT_TYPE_RTP
  sample_rate: 1,   // 16kHz
  channel: 1,       // Mono
  codec: 1,         // L16 (PCM)
  data_opt: 1,      // Mixed stream (all participants)
  send_rate: 20     // 20ms intervals
};

Processing Audio

RTMSManager.on('audio', ({ buffer, userName, timestamp }) => {
  // buffer = PCM 16-bit samples
  // Send to transcription service, save to file, etc.
  transcriptionService.process(buffer);
});

Video

Property	Options
Codec	H.264 (7), JPG (5), PNG (6)
Resolution	SD (1), HD 720p (2), FHD 1080p (3), QHD 2K (4)
FPS	1-30 (typically 25)
Data Option	Single active (3), Speaker view (4), Gallery view (5), Single individual stream (March 2026)

Rule: Use JPG/PNG when fps <= 5, H.264 when fps > 5

Video Configuration Example

const videoParams = {
  codec: 7,         // H.264
  resolution: 2,    // HD 720p
  fps: 25,
  data_opt: 3       // Single active speaker
};

Single Individual Participant Video

March 2026 added a new pattern for selecting one participant camera stream at a time.

Use it when you need:

per-user vision processing
a moderator-selected camera feed
deterministic participant focus instead of active speaker switching

Configuration rules:

set the video data_opt to VIDEO_SINGLE_INDIVIDUAL_STREAM
subscribe to PARTICIPANT_VIDEO_ON / PARTICIPANT_VIDEO_OFF
send VIDEO_SUBSCRIPTION_REQ with the chosen user_id
a new subscription overrides the previous participant stream

This is not a multi-participant subscription feature. RTMS currently supports only one individual participant video stream at a time.

Processing Video

RTMSManager.on('video', ({ buffer, userName, timestamp }) => {
  // buffer = H.264 NAL units
  // Decode with FFmpeg, save, or stream
  videoDecoder.decode(buffer);
});

Screen Share (SEPARATE from Video!)

Screen share has a different event from regular video (msg_type 16 vs 15).

Property	Options
Codec	JPG (5), PNG (6), H.264 (7)
Resolution	SD (1), HD 720p (2), FHD 1080p (3), QHD 2K (4)
FPS	1-5 for static content, 15-30 for animations

Screen Share Configuration

const deskshareParams = {
  codec: 5,         // JPG (good for static slides)
  resolution: 2,    // HD
  fps: 1            // Low FPS for slides
};

Processing Screen Share

RTMSManager.on('sharescreen', ({ buffer, userName, timestamp }) => {
  // buffer = JPG/PNG image or H.264 frame
  saveScreenCapture(buffer);
});

Transcript

Property	Value
Format	JSON text
Content Type	5 (MEDIA_CONTENT_TYPE_TEXT)
Languages	36 supported (see below)
`src_language`	Fixed requested language
`enable_lid`	Toggle Language Identification (default enabled)

Language IDs (Common)

Language	ID
English	9
Chinese (Simplified)	4
Chinese (Traditional)	5
Japanese	20
Korean	21
Spanish	28
French (France)	13
German	14

Tip: Use src_language plus enable_lid: false to force a fixed language. Leave enable_lid enabled when you want automatic language switching.

Transcript Structure

{
  "user_id": "user_id",
  "user_name": "Speaker Name",
  "text": "Transcribed text content",
  "timestamp": 1234567890,
  "is_final": true
}

Processing Transcript

RTMSManager.on('transcript', ({ text, userName, timestamp }) => {
  // text = transcribed speech
  // is_final = true for finalized segments
  saveTranscript(userName, text);
});

Chat

Property	Value
Format	JSON text
Content Type	5 (MEDIA_CONTENT_TYPE_TEXT)

Processing Chat

RTMSManager.on('chat', ({ text, userName, timestamp }) => {
  console.log(`[Chat] ${userName}: ${text}`);
  saveChatMessage(userName, text);
});

Complete Media Configuration

const mediaParams = {
  audio: {
    content_type: 1,  // RTP
    sample_rate: 1,   // 16kHz
    channel: 1,       // Mono
    codec: 1,         // L16 (PCM)
    data_opt: 1,      // Mixed stream
    send_rate: 20
  },
  video: {
    codec: 7,         // H.264
    resolution: 2,    // HD 720p
    fps: 25,
    data_opt: 3       // Single active speaker
  },
  deskshare: {
    codec: 5,         // JPG
    resolution: 2,    // HD
    fps: 1
  },
  transcript: {
    content_type: 5,  // TEXT
    src_language: 9,  // English
    enable_lid: false
  },
  chat: {
    content_type: 5   // TEXT
  }
};

Resources

Data types: https://developers.zoom.us/docs/rtms/data-types/
Media params: https://developers.zoom.us/docs/rtms/media-parameter-definition/
MEDIA_PARAMETERS.md: https://github.com/zoom/rtms-samples/blob/main/MEDIA_PARAMETERS.md

RTMS 5-Minute Preflight Runbook

Use this before deep debugging. It catches the highest-frequency RTMS issues fast.

Skill Doc Standard Note

Agent-skill standard entrypoint is SKILL.md.
This runbook is an operational convention (recommended), not a required skill file.
SKILL.md is also a navigation convention for larger skill docs.

1) Confirm Architecture Assumption

RTMS is backend-first media ingestion.
Frontend is optional and should consume backend outputs (WebSocket/SSE/etc).

If implementation assumes frontend-only RTMS behavior, redesign first.

2) Confirm Event-Triggered Kickoff

Processing starts only after RTMS lifecycle start events:
meeting.rtms_started
webinar.rtms_started
session.rtms_started
Stop events should deactivate pipeline.

If media handling starts before lifecycle start, session gating is wrong.

3) Confirm Product-Specific IDs

Meetings/Webinars: use meeting_uuid
Video SDK: use session_id
Use rtms_stream_id from payload for stream context

Using wrong ID field commonly breaks handshake/signature.

4) Confirm Webhook Handling Pattern

Respond 200 immediately.
Do heavy work asynchronously.
Verify webhook signature if secret token is configured.

Slow webhook responses can trigger retries and duplicate stream attempts.

5) Confirm Connection and Heartbeat

Track one active connection per stream/session reference.
Handle heartbeat ping/pong per protocol.
Implement reconnection strategy explicitly.

No heartbeat handling means unexpected disconnects.

6) Confirm Media Subscription/Gating

Ensure requested media types match your processing path.
Reject/ignore media packets for inactive sessions.
Expose pipeline status endpoint for observability.

This avoids silent packet handling when lifecycle is not active.

7) Quick Probe Checklist

GET /api/health returns service alive.
GET /api/pipeline/status shows expected active session count.
Mock/media probes show:
media before start -> rejected
start event -> pipeline active
media after start -> accepted

Copy/Paste Validation Commands

curl -sS "$RTMS_BASE_URL/api/health"
curl -sS "$RTMS_BASE_URL/api/pipeline/status"

Expected: healthy service JSON and correct active pipeline visibility.

8) Fast Decision Tree

No media at all -> lifecycle event not received or wrong webhook route.
Duplicate streams -> delayed webhook response or no active-session guard.
Handshake/auth errors -> wrong credential pair or wrong session ID field.
Frontend appears idle -> backend bridge not connected, not an RTMS source issue.

Related skills

Frontend DesignGenerate distinctive, production-grade frontend interfaces and components that avoid generic AI aesthetics.698k164k

Web Design GuidelinesAutomatically audit frontend code against the latest Vercel Web Interface Guidelines for consistency, accessibility, and UX quality.486k29.4k

Lark WhiteboardGenerate valid Lark Whiteboard (Feishu Board) documents with correctly placed connectors, automatic routing, and proper schema compliance directly from agent inst382k15.8k

Sleek Design Mobile AppsGenerate mobile app screens and UI from plain English descriptions using the Sleek.design API.315k470

Design Taste FrontendOverride generic LLM frontend output with intentional, metric-driven UI/UX taste.284k66.9k

Ui Ux Pro MaxGet instant, high-quality UI/UX decisions, component recommendations, style guidance, and code improvements across web and mobile projects.282k109k

Forks & variants (1)

Zoom Rtms has 1 known copy in the catalog totaling 15 installs. They canonicalize to this original listing.

zoom - 15 installs

How it compares

Pick zoom-rtms over general Zoom debugging when the integration uses Real-Time Media Streaming and you need a fast architecture preflight.

FAQ

What does zoom-rtms do?

Reference skill for Zoom RTMS. Use after routing to a live-media workflow when processing real-time audio, video, chat, transcripts, screen share, or contact-center voice streams.

When should I use zoom-rtms?

User asks about zoom rtms or related SKILL.md workflows.

Is zoom-rtms safe to install?

Review the Security Audits panel on this page before installing in production.

Design & UI/UXui