
Scribe
Integrate Zoom Build Scribe speech-to-text with correct JWT auth and choose fast synchronous vs batch job transcription flows.
Overview
scribe is an agent skill for the Build phase that documents Zoom Scribe API JWT authentication and fast versus batch transcription integration patterns.
Install
npx skills add https://github.com/anthropics/knowledge-work-plugins --skill scribeWhat is this skill?
- HS256 JWT bearer auth with issuer claim matching Build-platform API key and ~1 hour expiration
- Documents credential label drift (API key/secret vs SDK vs Build platform credentials) for portal verification
- Fast mode: POST /transcribe for short interactive files with synchronous JSON
- Batch mode: POST /jobs with status or webhook for archives and long or multi-file workloads
- Config surface includes language, diarization, word_time_offsets, channel_separation, timestamps, and output_format
- JWT expiration capped at one hour or less in examples
- Two transport modes: fast synchronous and batch asynchronous
Adoption & trust: 847 installs on skills.sh; 19.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to add speech-to-text via Zoom Scribe but the auth labels, JWT shape, and sync vs async endpoints are easy to mix up across documentation pages.
Who is it for?
Developers embedding Zoom Scribe transcription into a SaaS feature, agent pipeline, or internal workflow with Node or similar JWT signing.
Skip if: Teams wanting a no-code transcription UI, general meeting summarization without the Scribe API, or skills that only edit markdown documents.
When should I use this skill?
When implementing Zoom Scribe transcription: JWT generation, POST /transcribe, or POST /jobs with webhooks and config options.
What do I get? / Deliverables
You implement correct JWT signing, pick fast or batch transport, and send valid file-plus-config requests aligned with Scribe’s documented modes.
- Working JWT auth helper pattern for Scribe requests
- Chosen fast or batch integration flow with documented config fields
Recommended Skills
Journey fit
Scribe documentation skill belongs under Build/integrations because it encodes HTTP API contracts, auth, and job lifecycle for embedding transcription in your product. Integrations is the right shelf for third-party SaaS APIs (JWT, /transcribe, /jobs) rather than frontend UI or ship-time security review alone.
How it compares
Use as an API integration reference for Zoom Scribe, not as a generic writing or documentation scribe workflow skill.
Common Questions / FAQ
Who is scribe for?
Solo builders and small teams implementing Zoom Build Scribe HTTP transcription in code or agent automation who need JWT and endpoint clarity.
When should I use scribe?
Use it in Build (integrations) while coding upload handlers, job pollers, or webhook consumers for Scribe—phase-specific to third-party speech API wiring.
Is scribe safe to install?
Review the Security Audits panel on this Prism page; never commit API secrets—store Build-platform credentials in your secrets manager and rotate per Zoom guidance.
SKILL.md
READMESKILL.md - Scribe
# Auth and Processing Modes ## Authentication Model Scribe uses a Build-platform JWT bearer token. JWT shape: - algorithm: `HS256` - issuer claim: Build-platform credential identifier used by the Scribe API - expiration: keep to one hour or less Node example: ```js import { KJUR } from 'jsrsasign'; export function generateJWT(apiKey, apiSecret) { const iat = Math.round(Date.now() / 1000) - 30; const exp = iat + 60 * 60; return KJUR.jws.JWS.sign( 'HS256', JSON.stringify({ alg: 'HS256', typ: 'JWT' }), JSON.stringify({ iss: apiKey, iat, exp }), apiSecret, ); } ``` ## Credential Naming Drift Zoom docs currently use inconsistent labels across AI Services pages: - `API key` / `API secret` - `SDK key` / `SDK secret` - `Build platform credentials` For implementation, treat them as the Build-platform JWT issuer/secret pair used to sign Scribe requests. Verify the exact labels in the current portal UI before shipping. ## Fast Mode vs Batch Mode | Mode | Best for | Transport | Result timing | |------|----------|-----------|---------------| | Fast mode | One short file, interactive UX | `POST /transcribe` | Immediate synchronous JSON | | Batch mode | Archives, long media, many files | `POST /jobs` then status/webhook | Asynchronous | ## Fast Mode Request Shape - required: `file`, `config` - common config: `language`, `word_time_offsets`, `channel_separation`, `timestamps`, `output_format`, `profanity_filter`, `diarization` ## Batch Mode Request Shape - required: `input`, `output`, `config` - input modes: `SINGLE`, `PREFIX`, `MANIFEST` - storage provider currently surfaced in the OpenAPI as `S3` - optional webhook callback: `notifications.webhook_url` + `notifications.secret` ## Operational Choice Choose fast mode when: - user uploads one file - latency matters more than throughput - file size and duration are manageable - you are building pseudo-streaming over short microphone chunks from a browser UI Choose batch mode when: - many files must be processed - transcripts can arrive later - storage-centric workflows fit better than direct upload ## Browser Microphone Pseudo-Streaming Scribe is file-oriented, so a browser microphone UX should be modeled as repeated short uploads, not a long-lived stream. Recommended pattern: 1. capture browser microphone audio with `MediaRecorder` 2. flush short chunks to your backend 3. submit each chunk through the async fast-mode wrapper 4. poll by request ID 5. append transcript chunks in order Recommended starting values: - chunk size: `5 seconds` - acceptable range: `5-10 seconds` - concurrent in-flight chunks: `2-3` Why this works: - lowers the chance of frontend `504` on longer synchronous requests - gives incremental transcript updates without waiting for one long request Guardrail: - this is pseudo-streaming over file uploads - this is not the preferred production design for live audio capture - use it only when a lightweight browser demo or rough incremental transcript is acceptable - avoid it when you need stable low-latency live transcription, lower overhead, or stronger continuity across utterances - for true live media streams, low-latency server ingest, or continuous in-meeting audio, use `rtms` # Batch Job + Webhook Pipeline Use batch mode when you need to process stored archives asynchronously. ## Flow ```text submit batch job -> receive job_id -> poll /jobs or wait for webhook -> inspect /jobs/{jobId}/files -> ingest transcript outputs ``` ## Submit Example ```bash curl -X POST https://api.zoom.us/v2/aiservices/scribe/jobs -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d '{ "input": { "mode": "PREFIX", "source": "S3", "uri": "s3://example-bucket/audio/", "auth": { "aws": { "access_key_id": "...", "secret_access_key": "...", "session_token": "..." } } }, "output": { "destination": "S3", "uri": "s3://ex