Whisper

Name: Whisper
Author: davila7

davila7/claude-code-templates

1.2k installs
29.9k repo stars
Updated July 27, 2026
davila7/claude-code-templates

whisper is an agent skill for openai's general-purpose speech recognition model. supports 99 languages, transcription, translation to english, and language identification. six model sizes from tiny (39m.

About

The whisper skill is designed for openAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M. Whisper - Robust Speech Recognition OpenAI's multilingual speech recognition model. Invoke when the user asks about whisper or related SKILL.md workflows.

Speech-to-text transcription (99 languages).
Podcast/video transcription.
Meeting notes automation.
Translation to English.
Noisy audio transcription.

Whisper by the numbers

1,243 all-time installs (skills.sh)
+25 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #403 of 1,881 Marketing & SEO skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

whisper capabilities & compatibility

Capabilities: speech to text transcription (99 languages) · podcast/video transcription · meeting notes automation · translation to english
Use cases: seo

From the docs

What whisper says it does

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params)

SKILL.md

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes fr

SKILL.md

npx skills add https://github.com/davila7/claude-code-templates --skill whisper

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/davila7/claude-code-templates/whisper.svg)](https://skillselion.com/skills/davila7/claude-code-templates/whisper)

Installs	1.2k
repo stars	★ 29.9k
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	davila7/claude-code-templates ↗

How do I openai's general-purpose speech recognition model. supports 99 languages, transcription, translation to english, and language identification. six model sizes from tiny (39m?

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M.

Who is it for?

Developers using whisper workflows documented in SKILL.md.

Skip if: Skip when the task falls outside whisper scope or needs a different stack.

When should I use this skill?

User asks about whisper or related SKILL.md workflows.

What you get

Completed whisper workflow with documented commands, files, and expected deliverables.

Transcription output
Language selection guidance

By the numbers

Documents 99 supported Whisper languages
Top-tier languages target WER under 10%
Good-support languages target WER between 10% and 20%

Files

SKILL.mdMarkdownGitHub ↗

Whisper - Robust Speech Recognition

OpenAI's multilingual speech recognition model.

When to use Whisper

Use when:

Speech-to-text transcription (99 languages)
Podcast/video transcription
Meeting notes automation
Translation to English
Noisy audio transcription
Multilingual audio processing

Metrics:

72,900+ GitHub stars
99 languages supported
Trained on 680,000 hours of audio
MIT License

Use alternatives instead:

AssemblyAI: Managed API, speaker diarization
Deepgram: Real-time streaming ASR
Google Speech-to-Text: Cloud-based

Quick start

Installation

# Requires Python 3.8-3.11
pip install -U openai-whisper

# Requires ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Windows: choco install ffmpeg

Basic transcription

import whisper

# Load model
model = whisper.load_model("base")

# Transcribe
result = model.transcribe("audio.mp3")

# Print text
print(result["text"])

# Access segments
for segment in result["segments"]:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")

Model sizes

# Available models
models = ["tiny", "base", "small", "medium", "large", "turbo"]

# Load specific model
model = whisper.load_model("turbo")  # Fastest, good quality

Model	Parameters	English-only	Multilingual	Speed	VRAM
tiny	39M	✓	✓	~32x	~1 GB
base	74M	✓	✓	~16x	~1 GB
small	244M	✓	✓	~6x	~2 GB
medium	769M	✓	✓	~2x	~5 GB
large	1550M	✗	✓	1x	~10 GB
turbo	809M	✗	✓	~8x	~6 GB

Recommendation: Use turbo for best speed/quality, base for prototyping

Transcription options

Language specification

# Auto-detect language
result = model.transcribe("audio.mp3")

# Specify language (faster)
result = model.transcribe("audio.mp3", language="en")

# Supported: en, es, fr, de, it, pt, ru, ja, ko, zh, and 89 more

Task selection

# Transcription (default)
result = model.transcribe("audio.mp3", task="transcribe")

# Translation to English
result = model.transcribe("spanish.mp3", task="translate")
# Input: Spanish audio → Output: English text

Initial prompt

# Improve accuracy with context
result = model.transcribe(
    "audio.mp3",
    initial_prompt="This is a technical podcast about machine learning and AI."
)

# Helps with:
# - Technical terms
# - Proper nouns
# - Domain-specific vocabulary

Timestamps

# Word-level timestamps
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    for word in segment["words"]:
        print(f"{word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")

Temperature fallback

# Retry with different temperatures if confidence low
result = model.transcribe(
    "audio.mp3",
    temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
)

Command line usage

# Basic transcription
whisper audio.mp3

# Specify model
whisper audio.mp3 --model turbo

# Output formats
whisper audio.mp3 --output_format txt     # Plain text
whisper audio.mp3 --output_format srt     # Subtitles
whisper audio.mp3 --output_format vtt     # WebVTT
whisper audio.mp3 --output_format json    # JSON with timestamps

# Language
whisper audio.mp3 --language Spanish

# Translation
whisper spanish.mp3 --task translate

Batch processing

import os

audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]

for audio_file in audio_files:
    print(f"Transcribing {audio_file}...")
    result = model.transcribe(audio_file)

    # Save to file
    output_file = audio_file.replace(".mp3", ".txt")
    with open(output_file, "w") as f:
        f.write(result["text"])

Real-time transcription

# For streaming audio, use faster-whisper
# pip install faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe with streaming
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

GPU acceleration

import whisper

# Automatically uses GPU if available
model = whisper.load_model("turbo")

# Force CPU
model = whisper.load_model("turbo", device="cpu")

# Force GPU
model = whisper.load_model("turbo", device="cuda")

# 10-20× faster on GPU

Integration with other tools

Subtitle generation

# Generate SRT subtitles
whisper video.mp4 --output_format srt --language English

# Output: video.srt

With LangChain

from langchain.document_loaders import WhisperTranscriptionLoader

loader = WhisperTranscriptionLoader(file_path="audio.mp3")
docs = loader.load()

# Use transcription in RAG
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

Extract audio from video

# Use ffmpeg to extract audio
ffmpeg -i video.mp4 -vn -acodec pcm_s16le audio.wav

# Then transcribe
whisper audio.wav

Best practices

1. Use turbo model - Best speed/quality for English 2. Specify language - Faster than auto-detect 3. Add initial prompt - Improves technical terms 4. Use GPU - 10-20× faster 5. Batch process - More efficient 6. Convert to WAV - Better compatibility 7. Split long audio - <30 min chunks 8. Check language support - Quality varies by language 9. Use faster-whisper - 4× faster than openai-whisper 10. Monitor VRAM - Scale model size to hardware

Performance

Model	Real-time factor (CPU)	Real-time factor (GPU)
tiny	~0.32	~0.01
base	~0.16	~0.01
turbo	~0.08	~0.01
large	~1.0	~0.05

Real-time factor: 0.1 = 10× faster than real-time

Language support

Top-supported languages:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)

Full list: 99 languages total

Limitations

1. Hallucinations - May repeat or invent text 2. Long-form accuracy - Degrades on >30 min audio 3. Speaker identification - No diarization 4. Accents - Quality varies 5. Background noise - Can affect accuracy 6. Real-time latency - Not suitable for live captioning

Resources

GitHub: https://github.com/openai/whisper ⭐ 72,900+
Paper: https://arxiv.org/abs/2212.04356
Model Card: https://github.com/openai/whisper/blob/main/model-card.md
Colab: Available in repo
License: MIT

Whisper Language Support Guide

Complete guide to Whisper's multilingual capabilities.

Supported languages (99 total)

Top-tier support (WER < 10%)

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Dutch (nl)
Polish (pl)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)

Good support (WER 10-20%)

Arabic (ar)
Turkish (tr)
Vietnamese (vi)
Swedish (sv)
Finnish (fi)
Czech (cs)
Romanian (ro)
Hungarian (hu)
Danish (da)
Norwegian (no)
Thai (th)
Hebrew (he)
Greek (el)
Indonesian (id)
Malay (ms)

Full list (99 languages)

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba

Usage examples

Auto-detect language

import whisper

model = whisper.load_model("turbo")

# Auto-detect language
result = model.transcribe("audio.mp3")

print(f"Detected language: {result['language']}")
print(f"Text: {result['text']}")

Specify language (faster)

# Specify language for faster transcription
result = model.transcribe("audio.mp3", language="es")  # Spanish
result = model.transcribe("audio.mp3", language="fr")  # French
result = model.transcribe("audio.mp3", language="ja")  # Japanese

Translation to English

# Translate any language to English
result = model.transcribe(
    "spanish_audio.mp3",
    task="translate"  # Translates to English
)

print(f"Original language: {result['language']}")
print(f"English translation: {result['text']}")

Language-specific tips

Chinese

# Chinese works well with larger models
model = whisper.load_model("large")

result = model.transcribe(
    "chinese_audio.mp3",
    language="zh",
    initial_prompt="这是一段关于技术的讨论"  # Context helps
)

Japanese

# Japanese benefits from initial prompt
result = model.transcribe(
    "japanese_audio.mp3",
    language="ja",
    initial_prompt="これは技術的な会議の録音です"
)

Arabic

# Arabic: Use large model for best results
model = whisper.load_model("large")

result = model.transcribe(
    "arabic_audio.mp3",
    language="ar"
)

Model size recommendations

Language Tier	Recommended Model	WER
Top-tier (en, es, fr, de)	base/turbo	< 10%
Good (ar, tr, vi)	medium/large	10-20%
Lower-resource	large	20-30%

Performance by language

English

tiny: WER ~15%
base: WER ~8%
small: WER ~5%
medium: WER ~4%
large: WER ~3%
turbo: WER ~3.5%

Spanish

tiny: WER ~20%
base: WER ~12%
medium: WER ~6%
large: WER ~4%

Chinese

small: WER ~15%
medium: WER ~8%
large: WER ~5%

Best practices

1. Use English-only models - Better for small models (tiny/base) 2. Specify language - Faster than auto-detect 3. Add initial prompt - Improves accuracy for technical terms 4. Use larger models - For low-resource languages 5. Test on sample - Quality varies by accent/dialect 6. Consider audio quality - Clear audio = better results 7. Check language codes - Use ISO 639-1 codes (2 letters)

Language detection

# Detect language only (no transcription)
import whisper

model = whisper.load_model("base")

# Load audio
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# Make log-Mel spectrogram
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Detect language
_, probs = model.detect_language(mel)
detected_language = max(probs, key=probs.get)

print(f"Detected language: {detected_language}")
print(f"Confidence: {probs[detected_language]:.2%}")

Resources

Paper: https://arxiv.org/abs/2212.04356
GitHub: https://github.com/openai/whisper
Model Card: https://github.com/openai/whisper/blob/main/model-card.md

Related skills

Seo AuditRun structured SEO audits on their SaaS site or content hub and receive a prioritized action plan.167k41.1k

CopywritingGenerate, rewrite, or strengthen persuasive website and landing-page copy that converts visitors into users.158k41.1k

Viral Short FormQuickly generate high-retention hooks, scripts, and outlines for TikTok, Reels, YouTube Shorts, and carousels.132k64

Viral HooksWrite and critique viral hooks for short-form video opening sequences.123k64

Viral Captions And CtasOptimize social media captions and CTAs for viral short-form video reach and saves.123k64

Viral Youtube ShortsWrite and diagnose YouTube Shorts for Shorts Feed and long-form funnel.123k64

Forks & variants (4)

Whisper has 4 known copies in the catalog totaling 523 installs. They canonicalize to this original listing.

orchestra-research - 434 installs
ovachiever - 61 installs
nousresearch - 16 installs
firecrawl - 12 installs

FAQ

What does whisper do?

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M.

When should I use whisper?

User asks about whisper or related SKILL.md workflows.

Is whisper safe to install?

Review the Security Audits panel on this page before installing in production.

Marketing & SEOseo

About

Whisper by the numbers

whisper capabilities & compatibility

What whisper says it does

Add your badge

How do I openai's general-purpose speech recognition model. supports 99 languages, transcription, translation to english, and language identification. six model sizes from tiny (39m?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Whisper - Robust Speech Recognition

When to use Whisper

Quick start

Installation

Basic transcription

Model sizes

Transcription options

Language specification

Task selection

Initial prompt

Timestamps

Temperature fallback

Command line usage

Batch processing

Real-time transcription

GPU acceleration

Integration with other tools

Subtitle generation

With LangChain

Extract audio from video

Best practices

Performance

Language support

Limitations

Resources

Whisper Language Support Guide

Supported languages (99 total)

Top-tier support (WER < 10%)

Good support (WER 10-20%)

Full list (99 languages)

Usage examples

Auto-detect language

Specify language (faster)

Translation to English

Language-specific tips

Chinese

Japanese

Arabic

Model size recommendations

Performance by language

English

Spanish

Chinese

Best practices

Language detection

Resources

Related skills

Forks & variants (4)

FAQ

What does whisper do?

When should I use whisper?

Is whisper safe to install?

This week in AI coding