
Speech Recognition
Plan and implement iOS live speech transcription with SpeechAnalyzer and SpeechTranscriber, including asset install, buffer conversion, volatile vs final results, and session cleanup.
Overview
Speech-recognition is an agent skill for the Build phase that shapes iOS live transcription using SpeechAnalyzer, SpeechTranscriber, asset setup, and volatile versus final result handling.
Install
npx skills add https://github.com/dpearson2699/swift-ios-skills --skill speech-recognitionWhat is this skill?
- SpeechAnalyzer + SpeechTranscriber with presets such as .timeIndexedProgressiveTranscription (not legacy offline-only pa
- Pre-flight checks: SpeechTranscriber.isAvailable and supportedLocale(equivalentTo:)
- AssetInventory.assetInstallationRequest(supporting:) before analysis starts
- AVAudioEngine buffers converted via SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:) into AnalyzerInput
- Volatile vs finalized result handling to avoid duplicated live transcript text
- Eval speechanalyzer-live-transcription asserts 6 implementation requirements including volatile vs final handling and se
Adoption & trust: 1.7k installs on skills.sh; 713 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are building a live dictation or meeting transcript feature on iOS but legacy SpeechRecognizer patterns duplicate partial text or skip model assets and format conversion.
Who is it for?
Indie iOS developers shipping recorder or voice-note apps who target recent iOS Speech frameworks and need agent guidance aligned to SpeechAnalyzer eval criteria.
Skip if: Android or cross-platform ASR, server-side Whisper pipelines, or apps that only need static file transcription without live microphone streaming.
When should I use this skill?
Building iOS live speech transcription, SpeechAnalyzer/SpeechTranscriber setup, or reviewing dictation plans against current Apple speech APIs.
What do I get? / Deliverables
You leave with an implementation plan that uses current SpeechAnalyzer APIs, installs assets, streams converted audio buffers, updates UI from volatile then final results, and closes the session cleanly.
- SpeechAnalyzer integration architecture
- Live transcript UI update strategy (volatile vs final)
- Session lifecycle and cleanup checklist
Recommended Skills
Journey fit
Speech UI and on-device transcription are mobile product features assembled during Build, before App Store ship and growth iterations. Meeting recorders and dictation UIs are client-side Swift work—audio engine, live text, playback highlighting—not backend-only APIs.
How it compares
Opinionated SpeechAnalyzer path—not a generic copy-paste of deprecated SFSpeechRecognizer tutorials.
Common Questions / FAQ
Who is speech-recognition for?
Solo builders creating SwiftUI or UIKit apps that need live microphone transcription and playback-aligned transcripts on Apple platforms.
When should I use speech-recognition?
During Build frontend mobile work when implementing meeting recorders, live captions, or dictation with SpeechAnalyzer, asset download, and AVAudioEngine input.
Is speech-recognition safe to install?
Review Security Audits on this Prism page; microphone and on-device model usage still require your Info.plist privacy copy and App Store compliance review.
SKILL.md
READMESKILL.md - Speech Recognition
{ "skill_name": "speech-recognition", "evals": [ { "id": 0, "name": "speechanalyzer-live-transcription", "prompt": "I'm building an iOS 26 meeting recorder with live transcript text and word highlighting during playback. Sketch the SpeechAnalyzer implementation shape, including model setup, audio input, result handling, and cleanup.", "expected_output": "An iOS 26 SpeechAnalyzer plan that uses current SpeechTranscriber APIs, installs assets, converts live audio buffers, handles volatile/final results, and explicitly finishes the analyzer session.", "files": [], "assertions": [ "Uses SpeechAnalyzer with SpeechTranscriber and a documented preset such as .timeIndexedProgressiveTranscription, not .offlineTranscription.", "Checks SpeechTranscriber.isAvailable and supportedLocale(equivalentTo:) before starting transcription.", "Installs or verifies model assets with AssetInventory.assetInstallationRequest(supporting:) before analysis.", "Converts AVAudioEngine microphone buffers to SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:) before yielding AnalyzerInput.", "Handles volatile results separately from finalized results so live text is replaced rather than duplicated.", "Explicitly finalizes or cancels the analyzer session after input ends." ] }, { "id": 1, "name": "sfspeechrecognizer-live-review", "prompt": "Review this iOS live dictation plan: request only NSSpeechRecognitionUsageDescription, start AVAudioEngine immediately, use SFSpeechRecognizer forever in one recognition task, force requiresOnDeviceRecognition for every locale, and ignore availability changes. Give corrected guidance and focused Swift snippets.", "expected_output": "A correction-focused SFSpeechRecognizer review that covers speech and microphone authorization, live audio setup, availability changes, on-device checks, task cleanup, and recognition duration limits.", "files": [], "assertions": [ "Requires both NSSpeechRecognitionUsageDescription and NSMicrophoneUsageDescription for live microphone transcription.", "Requests speech recognition authorization and microphone record permission before activating the audio session or starting AVAudioEngine.", "Uses SFSpeechRecognizerDelegate or availability handling for recognition service changes.", "Checks supportsOnDeviceRecognition before setting requiresOnDeviceRecognition and gives a fallback when unsupported.", "Accounts for SFSpeechRecognizer recognition duration/service limits instead of running one unbounded task.", "Stops AVAudioEngine, removes the input tap, ends audio, and cancels or clears the recognition task during cleanup." ] }, { "id": 2, "name": "speech-boundary-routing", "prompt": "A notes feature records a spoken note, transcribes it, detects the text language and sentiment, translates a summary to Spanish, plays back the recording with captions, and optionally uses Apple Intelligence to summarize it. Which parts belong in the speech-recognition skill and which should be handed to sibling skills?", "expected_output": "A boundary-aware routing answer that keeps speech-to-text and microphone capture in Speech, and routes text analysis, translation, playback UI, and generative summarization to the correct sibling skills.", "files": [], "assertions": [ "Keeps microphone capture, speech authorization, SpeechAnalyzer or SFSpeechRecognizer transcription, and transcript result handling in the speech-recognition scope.", "Routes language detection, sentiment, text embeddings, and translation to the natural-language skill after transcription exists.", "Routes playback UI, captions during media playback, or AVPlayer/AVKit concerns to the avkit skill.", "Routes Apple Intelligence or Foundation Models summarization to the apple-on-devic