Openai Whisper Api

Name: Openai Whisper Api
Author: steipete

steipete/clawdis

Transcribe local audio files through OpenAI’s `/v1/audio/transcriptions` using a curl-based shell script and your API key.

Overview

Openai-whisper-api is an agent skill for the Build phase that transcribes audio via OpenAI’s transcriptions API using curl and a packaged shell script.

Install

npx skills add https://github.com/steipete/clawdis --skill openai-whisper-api

What is this skill?

Single `transcribe.sh` entrypoint with sensible defaults (`gpt-4o-transcribe`, `<input>.txt` output)
Supports `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, diarize model, and `whisper-1`
`OPENAI_BASE_URL` for OpenAI-compatible proxies or local gateways
Flags for language, custom prompt, JSON diarization output, and explicit `--out` path
Requires `curl`, `node`, and `OPENAI_API_KEY` per skill metadata

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.9k installs on skills.sh; 378k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have audio on disk but no quick, agent-repeatable path to OpenAI transcription models with configurable output.

Who is it for?

Indie builders adding voice-note or meeting-audio transcription to agents with minimal Python/JS boilerplate.

Skip if: Real-time streaming transcription, on-device whisper without network, or teams blocked from sending audio to third-party APIs.

When should I use this skill?

You need to transcribe a local audio file through OpenAI `/v1/audio/transcriptions` with `transcribe.sh` and `OPENAI_API_KEY`.

What do I get? / Deliverables

You run one script and get a text (or JSON diarized) transcript file beside your input or at a chosen `--out` path.

Plain-text transcript file (default `<input>.txt`)
Optional JSON diarization output when using diarize model with `--json`

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildIntegrations & version control

Speech-to-text is wired when the product or agent pipeline needs audio ingestion, typically while building features—not as a launch-only SEO task. This skill documents a direct HTTP integration (curl + optional node) to OpenAI transcription models—a canonical integrations subphase artifact.

Also useful

GrowContent & marketing

How it compares

Skill package for HTTP transcription curls—not a hosted MCP media server or a full podcast production pipeline.

Common Questions / FAQ

Who is openai-whisper-api for?

Solo builders and OpenClaw-style agent setups that already have `curl`, `node`, and an OpenAI API key and want file-based transcription.

When should I use openai-whisper-api?

In Build integrations when you need batch transcription from `.m4a`, `.ogg`, or similar files into text for agents, docs, or content pipelines.

Is openai-whisper-api safe to install?

Check this page’s Security Audits panel; audio is sent to OpenAI (or your `OPENAI_BASE_URL`) and requires safeguarding `OPENAI_API_KEY`.

SKILL.md

READMESKILL.md - Openai Whisper Api

# OpenAI transcriptions API

Transcribe audio through `/v1/audio/transcriptions`. Set `OPENAI_BASE_URL` for an OpenAI-compatible proxy or local gateway.

## Quick start

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
```

Defaults:

- Model: `gpt-4o-transcribe`
- Output: `<input>.txt`

## Useful flags

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-mini-transcribe
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe-diarize --json
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
```

Notes:

- Supported upload formats include `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`.
- 25 MB upload limit on the hosted API.
- Use diarize for speaker labels; script sends `chunking_strategy=auto` and rejects `--prompt`.

## API key

Set `OPENAI_API_KEY`, or configure it in the active OpenClaw config file (`$OPENCLAW_CONFIG_PATH`, default `~/.openclaw/openclaw.json`). Optionally set `OPENAI_BASE_URL`:

```json5
{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE",
    },
  },
}
```


#!/usr/bin/env bash
set -euo pipefail

usage() {
  cat >&2 <<'EOF'
Usage:
  transcribe.sh <audio-file> [--model gpt-4o-transcribe] [--out /path/to/out.txt] [--language en] [--prompt "hint"] [--json]
EOF
  exit 2
}

if [[ "${1:-}" == "" || "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
  usage
fi

in="${1:-}"
shift || true

model="gpt-4o-transcribe"
out=""
language=""
prompt=""
json_output=0

while [[ $# -gt 0 ]]; do
  case "$1" in
    --model)
      model="${2:-}"
      shift 2
      ;;
    --out)
      out="${2:-}"
      shift 2
      ;;
    --language)
      language="${2:-}"
      shift 2
      ;;
    --prompt)
      prompt="${2:-}"
      shift 2
      ;;
    --json)
      json_output=1
      shift 1
      ;;
    *)
      echo "Unknown arg: $1" >&2
      usage
      ;;
  esac
done

if [[ ! -f "$in" ]]; then
  echo "File not found: $in" >&2
  exit 1
fi

if [[ "${OPENAI_API_KEY:-}" == "" ]]; then
  echo "Missing OPENAI_API_KEY" >&2
  exit 1
fi

if [[ "$out" == "" ]]; then
  base="${in%.*}"
  if [[ "$json_output" == "1" ]]; then
    out="${base}.json"
  else
    out="${base}.txt"
  fi
fi

mkdir -p "$(dirname "$out")"

api_base="${OPENAI_BASE_URL:-https://api.openai.com/v1}"
api_base="${api_base%/}"

request_format="text"
if [[ "$json_output" == "1" ]]; then
  request_format="json"
fi

diarize=0
case "$model" in
  gpt-4o-transcribe | gpt-4o-mini-transcribe | gpt-4o-mini-transcribe-*)
    request_format="json"
    ;;
  gpt-4o-transcribe-diarize)
    diarize=1
    request_format="diarized_json"
    ;;
esac

if [[ "$diarize" == "1" && "$prompt" != "" ]]; then
  echo "--prompt is not supported with gpt-4o-transcribe-diarize" >&2
  exit 2
fi

target="$out"
tmp=""
if [[ "$json_output" == "0" && ( "$request_format" == "json" || "$request_format" == "diarized_json" ) ]]; then
  tmp="$(mktemp)"
  trap '[[ "$tmp" == "" ]] || rm -f "$tmp"' EXIT
  target="$tmp"
fi

curl_args=(
  -sS "${api_base}/audio/tr

What is this skill?

Single `transcribe.sh` entrypoint with sensible defaults (`gpt-4o-transcribe`, `<input>.txt` output)

Supports `gpt-4o-transcribe`, `gpt-4o-mini-transcribe`, diarize model, and `whisper-1`

`OPENAI_BASE_URL` for OpenAI-compatible proxies or local gateways

Flags for language, custom prompt, JSON diarization output, and explicit `--out` path

Requires `curl`, `node`, and `OPENAI_API_KEY` per skill metadata

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.9k installs on skills.sh; 378k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

GrowContent & marketing

SKILL.md

READMESKILL.md - Openai Whisper Api

# OpenAI transcriptions API

Transcribe audio through `/v1/audio/transcriptions`. Set `OPENAI_BASE_URL` for an OpenAI-compatible proxy or local gateway.

## Quick start

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
```

Defaults:

- Model: `gpt-4o-transcribe`
- Output: `<input>.txt`

## Useful flags

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-mini-transcribe
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model gpt-4o-transcribe-diarize --json
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
```

Notes:

- Supported upload formats include `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`.
- 25 MB upload limit on the hosted API.
- Use diarize for speaker labels; script sends `chunking_strategy=auto` and rejects `--prompt`.

## API key

Set `OPENAI_API_KEY`, or configure it in the active OpenClaw config file (`$OPENCLAW_CONFIG_PATH`, default `~/.openclaw/openclaw.json`). Optionally set `OPENAI_BASE_URL`:

```json5
{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE",
    },
  },
}
```


#!/usr/bin/env bash
set -euo pipefail

usage() {
  cat >&2 <<'EOF'
Usage:
  transcribe.sh <audio-file> [--model gpt-4o-transcribe] [--out /path/to/out.txt] [--language en] [--prompt "hint"] [--json]
EOF
  exit 2
}

if [[ "${1:-}" == "" || "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
  usage
fi

in="${1:-}"
shift || true

model="gpt-4o-transcribe"
out=""
language=""
prompt=""
json_output=0

while [[ $# -gt 0 ]]; do
  case "$1" in
    --model)
      model="${2:-}"
      shift 2
      ;;
    --out)
      out="${2:-}"
      shift 2
      ;;
    --language)
      language="${2:-}"
      shift 2
      ;;
    --prompt)
      prompt="${2:-}"
      shift 2
      ;;
    --json)
      json_output=1
      shift 1
      ;;
    *)
      echo "Unknown arg: $1" >&2
      usage
      ;;
  esac
done

if [[ ! -f "$in" ]]; then
  echo "File not found: $in" >&2
  exit 1
fi

if [[ "${OPENAI_API_KEY:-}" == "" ]]; then
  echo "Missing OPENAI_API_KEY" >&2
  exit 1
fi

if [[ "$out" == "" ]]; then
  base="${in%.*}"
  if [[ "$json_output" == "1" ]]; then
    out="${base}.json"
  else
    out="${base}.txt"
  fi
fi

mkdir -p "$(dirname "$out")"

api_base="${OPENAI_BASE_URL:-https://api.openai.com/v1}"
api_base="${api_base%/}"

request_format="text"
if [[ "$json_output" == "1" ]]; then
  request_format="json"
fi

diarize=0
case "$model" in
  gpt-4o-transcribe | gpt-4o-mini-transcribe | gpt-4o-mini-transcribe-*)
    request_format="json"
    ;;
  gpt-4o-transcribe-diarize)
    diarize=1
    request_format="diarized_json"
    ;;
esac

if [[ "$diarize" == "1" && "$prompt" != "" ]]; then
  echo "--prompt is not supported with gpt-4o-transcribe-diarize" >&2
  exit 2
fi

target="$out"
tmp=""
if [[ "$json_output" == "0" && ( "$request_format" == "json" || "$request_format" == "diarized_json" ) ]]; then
  tmp="$(mktemp)"
  trap '[[ "$tmp" == "" ]] || rm -f "$tmp"' EXIT
  target="$tmp"
fi

curl_args=(
  -sS "${api_base}/audio/tr

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is openai-whisper-api for?

When should I use openai-whisper-api?

Is openai-whisper-api safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is openai-whisper-api for?

When should I use openai-whisper-api?

Is openai-whisper-api safe to install?

SKILL.md