Whisper

Name: Whisper
Author: orchestra-research

orchestra-research/ai-research-skills

434 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

This is a copy of whisper by davila7 - installs and ranking accrue to the original listing.

whisper is an agent skill that maps Whisper language codes and quality tiers across 99 supported languages for developers who add multilingual speech-to-text to applications, agents, or audio pipelines.

About

whisper is a language support reference skill from orchestra-research/ai-research-skills for OpenAI Whisper multilingual speech-to-text. It documents 99 supported languages grouped by expected word-error rate: top-tier WER under 10% for English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Japanese, Korean, and Chinese; good support at WER 10–20% for Arabic, Turkish, Vietnamese, Swedish, Finnish, and additional locales; plus a full alphabetical list from Afrikaans through major world languages. Developers reach for whisper when picking `language` parameters, estimating transcription quality, or planning multilingual agent voice input before deploying STT in production pipelines.

Documents 99 supported Whisper languages end to end
Top-tier WER under 10% called out for 12 major locales including English, Spanish, and Chinese
Good-tier WER 10–20% bucket for 15 additional languages
Full alphabetical language list for locale and subtitle pipeline planning
Guides multilingual ASR scope before shipping voice features

Whisper by the numbers

434 all-time installs (skills.sh)
+35 installs in the week ending Jul 26, 2026 (Skillselion tracking)
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill whisper

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/whisper.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/whisper)

Installs	434
repo stars	★ 11.2k
Security audit	3 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

Which languages does Whisper support for transcription?

Choose Whisper language codes and quality tiers when adding multilingual speech-to-text to apps, agents, or pipelines.

Who is it for?

Developers integrating Whisper STT who must choose language codes and set quality expectations across multilingual audio sources.

Skip if: Teams building custom acoustic models from scratch instead of using Whisper's pretrained multilingual STT.

When should I use this skill?

An agent must select Whisper language codes, compare WER tiers, or plan multilingual speech-to-text integration.

What you get

Selected Whisper language codes, documented WER quality tier, and multilingual STT integration parameters.

Language code selection
WER tier quality expectations

By the numbers

Documents 99 supported Whisper languages
Lists 12 top-tier languages with WER under 10%

Files

SKILL.mdMarkdownGitHub ↗

Whisper - Robust Speech Recognition

OpenAI's multilingual speech recognition model.

When to use Whisper

Use when:

Speech-to-text transcription (99 languages)
Podcast/video transcription
Meeting notes automation
Translation to English
Noisy audio transcription
Multilingual audio processing

Metrics:

72,900+ GitHub stars
99 languages supported
Trained on 680,000 hours of audio
MIT License

Use alternatives instead:

AssemblyAI: Managed API, speaker diarization
Deepgram: Real-time streaming ASR
Google Speech-to-Text: Cloud-based

Quick start

Installation

# Requires Python 3.8-3.11
pip install -U openai-whisper

# Requires ffmpeg
# macOS: brew install ffmpeg
# Ubuntu: sudo apt install ffmpeg
# Windows: choco install ffmpeg

Basic transcription

import whisper

# Load model
model = whisper.load_model("base")

# Transcribe
result = model.transcribe("audio.mp3")

# Print text
print(result["text"])

# Access segments
for segment in result["segments"]:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")

Model sizes

# Available models
models = ["tiny", "base", "small", "medium", "large", "turbo"]

# Load specific model
model = whisper.load_model("turbo")  # Fastest, good quality

Model	Parameters	English-only	Multilingual	Speed	VRAM
tiny	39M	✓	✓	~32x	~1 GB
base	74M	✓	✓	~16x	~1 GB
small	244M	✓	✓	~6x	~2 GB
medium	769M	✓	✓	~2x	~5 GB
large	1550M	✗	✓	1x	~10 GB
turbo	809M	✗	✓	~8x	~6 GB

Recommendation: Use turbo for best speed/quality, base for prototyping

Transcription options

Language specification

# Auto-detect language
result = model.transcribe("audio.mp3")

# Specify language (faster)
result = model.transcribe("audio.mp3", language="en")

# Supported: en, es, fr, de, it, pt, ru, ja, ko, zh, and 89 more

Task selection

# Transcription (default)
result = model.transcribe("audio.mp3", task="transcribe")

# Translation to English
result = model.transcribe("spanish.mp3", task="translate")
# Input: Spanish audio → Output: English text

Initial prompt

# Improve accuracy with context
result = model.transcribe(
    "audio.mp3",
    initial_prompt="This is a technical podcast about machine learning and AI."
)

# Helps with:
# - Technical terms
# - Proper nouns
# - Domain-specific vocabulary

Timestamps

# Word-level timestamps
result = model.transcribe("audio.mp3", word_timestamps=True)

for segment in result["segments"]:
    for word in segment["words"]:
        print(f"{word['word']} ({word['start']:.2f}s - {word['end']:.2f}s)")

Temperature fallback

# Retry with different temperatures if confidence low
result = model.transcribe(
    "audio.mp3",
    temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
)

Command line usage

# Basic transcription
whisper audio.mp3

# Specify model
whisper audio.mp3 --model turbo

# Output formats
whisper audio.mp3 --output_format txt     # Plain text
whisper audio.mp3 --output_format srt     # Subtitles
whisper audio.mp3 --output_format vtt     # WebVTT
whisper audio.mp3 --output_format json    # JSON with timestamps

# Language
whisper audio.mp3 --language Spanish

# Translation
whisper spanish.mp3 --task translate

Batch processing

import os

audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]

for audio_file in audio_files:
    print(f"Transcribing {audio_file}...")
    result = model.transcribe(audio_file)

    # Save to file
    output_file = audio_file.replace(".mp3", ".txt")
    with open(output_file, "w") as f:
        f.write(result["text"])

Real-time transcription

# For streaming audio, use faster-whisper
# pip install faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("base", device="cuda", compute_type="float16")

# Transcribe with streaming
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

GPU acceleration

import whisper

# Automatically uses GPU if available
model = whisper.load_model("turbo")

# Force CPU
model = whisper.load_model("turbo", device="cpu")

# Force GPU
model = whisper.load_model("turbo", device="cuda")

# 10-20× faster on GPU

Integration with other tools

Subtitle generation

# Generate SRT subtitles
whisper video.mp4 --output_format srt --language English

# Output: video.srt

With LangChain

from langchain.document_loaders import WhisperTranscriptionLoader

loader = WhisperTranscriptionLoader(file_path="audio.mp3")
docs = loader.load()

# Use transcription in RAG
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

Extract audio from video

# Use ffmpeg to extract audio
ffmpeg -i video.mp4 -vn -acodec pcm_s16le audio.wav

# Then transcribe
whisper audio.wav

Best practices

1. Use turbo model - Best speed/quality for English 2. Specify language - Faster than auto-detect 3. Add initial prompt - Improves technical terms 4. Use GPU - 10-20× faster 5. Batch process - More efficient 6. Convert to WAV - Better compatibility 7. Split long audio - <30 min chunks 8. Check language support - Quality varies by language 9. Use faster-whisper - 4× faster than openai-whisper 10. Monitor VRAM - Scale model size to hardware

Performance

Model	Real-time factor (CPU)	Real-time factor (GPU)
tiny	~0.32	~0.01
base	~0.16	~0.01
turbo	~0.08	~0.01
large	~1.0	~0.05

Real-time factor: 0.1 = 10× faster than real-time

Language support

Top-supported languages:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)

Full list: 99 languages total

Limitations

1. Hallucinations - May repeat or invent text 2. Long-form accuracy - Degrades on >30 min audio 3. Speaker identification - No diarization 4. Accents - Quality varies 5. Background noise - Can affect accuracy 6. Real-time latency - Not suitable for live captioning

Resources

GitHub: https://github.com/openai/whisper ⭐ 72,900+
Paper: https://arxiv.org/abs/2212.04356
Model Card: https://github.com/openai/whisper/blob/main/model-card.md
Colab: Available in repo
License: MIT

Whisper Language Support Guide

Complete guide to Whisper's multilingual capabilities.

Supported languages (99 total)

Top-tier support (WER < 10%)

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Dutch (nl)
Polish (pl)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)

Good support (WER 10-20%)

Arabic (ar)
Turkish (tr)
Vietnamese (vi)
Swedish (sv)
Finnish (fi)
Czech (cs)
Romanian (ro)
Hungarian (hu)
Danish (da)
Norwegian (no)
Thai (th)
Hebrew (he)
Greek (el)
Indonesian (id)
Malay (ms)

Full list (99 languages)

Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Moldavian, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Pushto, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba

Usage examples

Auto-detect language

import whisper

model = whisper.load_model("turbo")

# Auto-detect language
result = model.transcribe("audio.mp3")

print(f"Detected language: {result['language']}")
print(f"Text: {result['text']}")

Specify language (faster)

# Specify language for faster transcription
result = model.transcribe("audio.mp3", language="es")  # Spanish
result = model.transcribe("audio.mp3", language="fr")  # French
result = model.transcribe("audio.mp3", language="ja")  # Japanese

Translation to English

# Translate any language to English
result = model.transcribe(
    "spanish_audio.mp3",
    task="translate"  # Translates to English
)

print(f"Original language: {result['language']}")
print(f"English translation: {result['text']}")

Language-specific tips

Chinese

# Chinese works well with larger models
model = whisper.load_model("large")

result = model.transcribe(
    "chinese_audio.mp3",
    language="zh",
    initial_prompt="这是一段关于技术的讨论"  # Context helps
)

Japanese

# Japanese benefits from initial prompt
result = model.transcribe(
    "japanese_audio.mp3",
    language="ja",
    initial_prompt="これは技術的な会議の録音です"
)

Arabic

# Arabic: Use large model for best results
model = whisper.load_model("large")

result = model.transcribe(
    "arabic_audio.mp3",
    language="ar"
)

Model size recommendations

Language Tier	Recommended Model	WER
Top-tier (en, es, fr, de)	base/turbo	< 10%
Good (ar, tr, vi)	medium/large	10-20%
Lower-resource	large	20-30%

Performance by language

English

tiny: WER ~15%
base: WER ~8%
small: WER ~5%
medium: WER ~4%
large: WER ~3%
turbo: WER ~3.5%

Spanish

tiny: WER ~20%
base: WER ~12%
medium: WER ~6%
large: WER ~4%

Chinese

small: WER ~15%
medium: WER ~8%
large: WER ~5%

Best practices

1. Use English-only models - Better for small models (tiny/base) 2. Specify language - Faster than auto-detect 3. Add initial prompt - Improves accuracy for technical terms 4. Use larger models - For low-resource languages 5. Test on sample - Quality varies by accent/dialect 6. Consider audio quality - Clear audio = better results 7. Check language codes - Use ISO 639-1 codes (2 letters)

Language detection

# Detect language only (no transcription)
import whisper

model = whisper.load_model("base")

# Load audio
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# Make log-Mel spectrogram
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# Detect language
_, probs = model.detect_language(mel)
detected_language = max(probs, key=probs.get)

print(f"Detected language: {detected_language}")
print(f"Confidence: {probs[detected_language]:.2%}")

Resources

Paper: https://arxiv.org/abs/2212.04356
GitHub: https://github.com/openai/whisper
Model Card: https://github.com/openai/whisper/blob/main/model-card.md

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use whisper for language-code and WER-tier planning in Whisper STT; use guidance when the task is constrained LLM text generation rather than audio transcription.

FAQ

How many languages does Whisper support?

whisper documents 99 supported languages with ISO codes, grouped into top-tier (WER under 10%), good (WER 10–20%), and a full alphabetical list from Afrikaans through major world languages.

Which Whisper languages have the best transcription quality?

whisper lists 12 top-tier languages—English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Japanese, Korean, and Chinese—with expected WER under 10% for high-confidence STT integration.

Is Whisper safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingllmautomation

About

Whisper by the numbers

Add your badge

Which languages does Whisper support for transcription?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Whisper - Robust Speech Recognition

When to use Whisper

Quick start

Installation

Basic transcription

Model sizes

Transcription options

Language specification

Task selection

Initial prompt

Timestamps

Temperature fallback

Command line usage

Batch processing

Real-time transcription

GPU acceleration

Integration with other tools

Subtitle generation

With LangChain

Extract audio from video

Best practices

Performance

Language support

Limitations

Resources

Whisper Language Support Guide

Supported languages (99 total)

Top-tier support (WER < 10%)

Good support (WER 10-20%)

Full list (99 languages)

Usage examples

Auto-detect language

Specify language (faster)

Translation to English

Language-specific tips

Chinese

Japanese

Arabic

Model size recommendations

Performance by language

English

Spanish

Chinese

Best practices

Language detection

Resources

Related skills

How it compares

FAQ

How many languages does Whisper support?

Which Whisper languages have the best transcription quality?

Is Whisper safe to install?

This week in AI coding