
Local Whisper
Transcribe voice memos, meeting audio, and agent-recorded WAV files locally without sending audio to a cloud STT API.
Overview
Local Whisper is an agent skill for the Build phase that transcribes audio files with OpenAI Whisper entirely offline after models are downloaded.
Install
npx skills add https://github.com/thinkfleetai/thinkfleet-engine --skill local-whisperWhat is this skill?
- Fully offline transcription after one-time Whisper model download
- CLI with five model tiers from tiny (39M) through large-v3 (1.5GB)
- Optional word timestamps and JSON output for pipelines
- Default base model; turbo model positioned for best speed/quality tradeoff
- uv-managed Python 3.12 venv with ffmpeg as the required binary
- 5 documented model sizes from tiny through large-v3
- Default model base at 74M parameters
Adoption & trust: 1 installs on skills.sh; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You have audio you need as text but cloud STT adds cost, privacy risk, and network dependency for every run.
Who is it for?
Builders who want repeatable local transcription on macOS/Linux agents with ffmpeg and room for up to ~1.5GB model weights.
Skip if: Teams that need real-time streaming dictation, GPU-only training pipelines, or transcription without installing Python, torch, and ffmpeg.
When should I use this skill?
You need high-quality speech-to-text from audio files and want it to run fully offline after model download.
What do I get? / Deliverables
You get plain-text or JSON transcripts with optional timestamps from a local CLI, ready to paste into docs, tickets, or the next agent skill in your chain.
- Plain-text transcript on stdout
- Optional JSON transcript with word-level timestamps
Recommended Skills
Journey fit
Local Whisper is a concrete integration skill—install models, run the bundled script, and pipe transcripts into downstream build or agent workflows. Speech-to-text is wired as an offline capability your agent stack calls during product build, not a launch or growth distribution task.
How it compares
Use instead of always-on cloud Whisper APIs when offline runs and data residency matter more than managed scaling.
Common Questions / FAQ
Who is local-whisper for?
Solo and indie developers running Claude Code, Cursor, or similar agents who need WAV-to-text locally for notes, content, or automation without a paid STT service.
When should I use local-whisper?
During build when you are ingesting voice recordings into specs or agent memory, or in grow/operate when you batch-transcribe support or meeting audio on your own machine.
Is local-whisper safe to install?
Review the Security Audits panel on this Prism page before installing; the skill runs shell scripts, downloads ML weights, and needs filesystem access for venv and models.
SKILL.md
READMESKILL.md - Local Whisper
# Local Whisper STT Local speech-to-text using OpenAI's Whisper. **Fully offline** after initial model download. ## Usage ```bash # Basic ~/.thinkfleetbot/skills/local-whisper/scripts/local-whisper audio.wav # Better model ~/.thinkfleetbot/skills/local-whisper/scripts/local-whisper audio.wav --model turbo # With timestamps ~/.thinkfleetbot/skills/local-whisper/scripts/local-whisper audio.wav --timestamps --json ``` ## Models | Model | Size | Notes | |-------|------|-------| | `tiny` | 39M | Fastest | | `base` | 74M | **Default** | | `small` | 244M | Good balance | | `turbo` | 809M | Best speed/quality | | `large-v3` | 1.5GB | Maximum accuracy | ## Options - `--model/-m` — Model size (default: base) - `--language/-l` — Language code (auto-detect if omitted) - `--timestamps/-t` — Include word timestamps - `--json/-j` — JSON output - `--quiet/-q` — Suppress progress ## Setup Uses uv-managed venv at `.venv/`. To reinstall: ```bash cd ~/.thinkfleetbot/skills/local-whisper uv venv .venv --python 3.12 uv pip install --python .venv/bin/python click openai-whisper torch --index-url https://download.pytorch.org/whl/cpu ```