
Openai Whisper Api
Transcribe local audio files through OpenAI’s `/v1/audio/transcriptions` using a curl-based shell script and your API key.
Overview
Openai-whisper-api is an agent skill for the Build phase that transcribes audio via OpenAI’s transcriptions API using curl and a packaged shell script.
Install
npx skills add https://github.com/steipete/clawdis --skill openai-whisper-apiWhat is this skill?
- Single `transcribe.sh` entrypoint with sensible defaults (`gpt-4o-transcribe`, `<input>.txt` output)
- Supports `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, diarize model, and `whisper-1`
- `OPENAI_BASE_URL` for OpenAI-compatible proxies or local gateways
- Flags for language, custom prompt, JSON diarization output, and explicit `--out` path
- Requires `curl`, `node`, and `OPENAI_API_KEY` per skill metadata
Adoption & trust: 1.9k installs on skills.sh; 378k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have audio on disk but no quick, agent-repeatable path to OpenAI transcription models with configurable output.
Who is it for?
Indie builders adding voice-note or meeting-audio transcription to agents with minimal Python/JS boilerplate.
Skip if: Real-time streaming transcription, on-device whisper without network, or teams blocked from sending audio to third-party APIs.
When should I use this skill?
You need to transcribe a local audio file through OpenAI `/v1/audio/transcriptions` with `transcribe.sh` and `OPENAI_API_KEY`.
What do I get? / Deliverables
You run one script and get a text (or JSON diarized) transcript file beside your input or at a chosen `--out` path.
- Plain-text transcript file (default `<input>.txt`)
- Optional JSON diarization output when using diarize model with `--json`
Recommended Skills
Journey fit
Speech-to-text is wired when the product or agent pipeline needs audio ingestion, typically while building features—not as a launch-only SEO task. This skill documents a direct HTTP integration (curl + optional node) to OpenAI transcription models—a canonical integrations subphase artifact.
How it compares
Skill package for HTTP transcription curls—not a hosted MCP media server or a full podcast production pipeline.
Common Questions / FAQ
Who is openai-whisper-api for?
Solo builders and OpenClaw-style agent setups that already have `curl`, `node`, and an OpenAI API key and want file-based transcription.
When should I use openai-whisper-api?
In Build integrations when you need batch transcription from `.m4a`, `.ogg`, or similar files into text for agents, docs, or content pipelines.
Is openai-whisper-api safe to install?
Check this page’s Security Audits panel; audio is sent to OpenAI (or your `OPENAI_BASE_URL`) and requires safeguarding `OPENAI_API_KEY`.
SKILL.md
READMESKILL.md - Openai Whisper Api
# OpenAI transcriptions API Transcribe audio through `/v1/audio/transcriptions`. Set `OPENAI_BASE_URL` for an OpenAI-compatible proxy or local gateway. ## Quick start ```bash {baseDir}/scripts/transcribe.sh /path/to/audio.m4a ``` Defaults: - Model: `gpt-4o-transcribe` - Output: `<input>.txt` ## Useful flags ```bash {baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe --out /tmp/transcript.txt {baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-mini-transcribe {baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe-diarize --json {baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 {baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en {baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel" {baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json ``` Notes: - Supported upload formats include `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`. - 25 MB upload limit on the hosted API. - Use diarize for speaker labels; script sends `chunking_strategy=auto` and rejects `--prompt`. ## API key Set `OPENAI_API_KEY`, or configure it in the active OpenClaw config file (`$OPENCLAW_CONFIG_PATH`, default `~/.openclaw/openclaw.json`). Optionally set `OPENAI_BASE_URL`: ```json5 { skills: { "openai-whisper-api": { apiKey: "OPENAI_KEY_HERE", }, }, } ``` #!/usr/bin/env bash set -euo pipefail usage() { cat >&2 <<'EOF' Usage: transcribe.sh <audio-file> [--model gpt-4o-transcribe] [--out /path/to/out.txt] [--language en] [--prompt "hint"] [--json] EOF exit 2 } if [[ "${1:-}" == "" || "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then usage fi in="${1:-}" shift || true model="gpt-4o-transcribe" out="" language="" prompt="" json_output=0 while [[ $# -gt 0 ]]; do case "$1" in --model) model="${2:-}" shift 2 ;; --out) out="${2:-}" shift 2 ;; --language) language="${2:-}" shift 2 ;; --prompt) prompt="${2:-}" shift 2 ;; --json) json_output=1 shift 1 ;; *) echo "Unknown arg: $1" >&2 usage ;; esac done if [[ ! -f "$in" ]]; then echo "File not found: $in" >&2 exit 1 fi if [[ "${OPENAI_API_KEY:-}" == "" ]]; then echo "Missing OPENAI_API_KEY" >&2 exit 1 fi if [[ "$out" == "" ]]; then base="${in%.*}" if [[ "$json_output" == "1" ]]; then out="${base}.json" else out="${base}.txt" fi fi mkdir -p "$(dirname "$out")" api_base="${OPENAI_BASE_URL:-https://api.openai.com/v1}" api_base="${api_base%/}" request_format="text" if [[ "$json_output" == "1" ]]; then request_format="json" fi diarize=0 case "$model" in gpt-4o-transcribe | gpt-4o-mini-transcribe | gpt-4o-mini-transcribe-*) request_format="json" ;; gpt-4o-transcribe-diarize) diarize=1 request_format="diarized_json" ;; esac if [[ "$diarize" == "1" && "$prompt" != "" ]]; then echo "--prompt is not supported with gpt-4o-transcribe-diarize" >&2 exit 2 fi target="$out" tmp="" if [[ "$json_output" == "0" && ( "$request_format" == "json" || "$request_format" == "diarized_json" ) ]]; then tmp="$(mktemp)" trap '[[ "$tmp" == "" ]] || rm -f "$tmp"' EXIT target="$tmp" fi curl_args=( -sS "${api_base}/audio/tr