
Type4me Macos Voice Input
Extend or ship Type4Me on macOS with local Sherpa or cloud ASR, LLM polish, and privacy-first local storage.
Overview
Type4Me macOS Voice Input is an agent skill for the Build phase that guides Swift developers through architecting, extending, and deploying a local-first macOS voice dictation app with pluggable ASR and optional LLM opti
Install
npx skills add https://github.com/aradotso/trending-skills --skill type4me-macos-voice-inputWhat is this skill?
- Global hotkey capture with injection into any macOS app
- Local ASR via SherpaOnnx (Paraformer/Zipformer) and cloud via Volcengine and Deepgram
- Optional LLM post-processing with custom prompt modes
- Plugin registry and SpeechRecognizer protocol for new ASR vendors
- Credentials and history stored locally—no telemetry or cloud sync
- Documents local SherpaOnnx/Paraformer/Zipformer and cloud Volcengine/Deepgram ASR paths
- Architecture spans ASR/, Bridge/, and per-vendor provider modules under a registry pattern
Adoption & trust: 762 installs on skills.sh; 31 GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want macOS-wide voice typing with your choice of local or cloud recognition, but integrating Sherpa, Volcengine, and LLM modes in one Swift codebase is easy to get wrong.
Who is it for?
Solo macOS builders extending Type4Me or cloning its ASR plugin pattern for a privacy-focused voice agent input layer.
Skip if: Teams needing Windows/Linux voice input, a hosted SaaS dictation product with no local Swift build, or one-click install without touching ASR configuration.
When should I use this skill?
Add ASR providers, build/deploy from source, configure Sherpa or Volcengine, add custom prompt modes, implement SpeechRecognizer protocol, troubleshoot voice input, or extend cloud ASR for Type4Me.
What do I get? / Deliverables
You can add ASR providers, configure engines, build from source, and fix voice-input failures using the documented module boundaries and protocols.
- Working ASR provider integration or configuration
- Buildable macOS voice-input deployment from documented layout
Recommended Skills
Journey fit
Voice-input tooling is built and integrated while assembling agent-adjacent desktop workflows, not during launch or growth. Covers ASR provider plugins, Swift architecture, and hotkey-to-injection pipelines—classic agent-tooling build work.
How it compares
Use for native macOS voice pipelines with pluggable ASR—not a generic Whisper API wrapper skill or a browser-only dictation snippet.
Common Questions / FAQ
Who is type4me-macos-voice-input for?
Swift/macOS developers and indie builders who maintain or fork Type4Me and need to wire local Sherpa or cloud ASR with optional LLM text cleanup.
When should I use type4me-macos-voice-input?
During Build (agent-tooling) when adding an ASR provider, setting up Sherpa or Volcengine, implementing custom prompt modes, or debugging hotkey transcription; also when preparing a from-source deploy of the macOS app.
Is type4me-macos-voice-input safe to install?
The skill describes local credential storage and no telemetry in Type4Me itself; review the Security Audits panel on this Prism page before trusting third-party ASR keys or binaries.
SKILL.md
READMESKILL.md - Type4me Macos Voice Input
# Type4Me macOS Voice Input > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync. ## Architecture Overview ``` Type4Me/ ├── ASR/ # ASR engine abstraction │ ├── ASRProvider.swift # Provider enum + protocols │ ├── ASRProviderRegistry.swift # Plugin registry │ ├── Providers/ # Per-vendor config files │ ├── SherpaASRClient.swift # Local streaming ASR │ ├── SherpaOfflineASRClient.swift │ ├── VolcASRClient.swift # Volcengine streaming ASR │ └── DeepgramASRClient.swift # Deepgram streaming ASR ├── Bridge/ # SherpaOnnx C API Swift bridge ├── Audio/ # Audio capture ├── Session/ # Core state machine: record→ASR→inject ├── Input/ # Global hotkey management ├── Services/ # Credentials, hotwords, model manager ├── Protocol/ # Volcengine WebSocket codec └── UI/ # SwiftUI (FloatingBar + Settings) ``` ## Installation ### Prerequisites ```bash # Xcode Command Line Tools xcode-select --install # CMake (for local ASR engine) brew install cmake ``` ### Build & Deploy from Source ```bash git clone https://github.com/joewongjc/type4me.git cd type4me # Step 1: Compile SherpaOnnx local engine (~5 min, one-time) bash scripts/build-sherpa.sh # Step 2: Build, bundle, sign, install to /Applications, and launch bash scripts/deploy.sh ``` ### Download Pre-built App Download `Type4Me-v1.2.3.dmg` from releases (cloud ASR only, no local engine): ``` https://github.com/joewongjc/type4me/releases/tag/v1.2.3 ``` If macOS blocks the app: ```bash xattr -d com.apple.quarantine /Applications/Type4Me.app ``` ### Download Local ASR Models ```bash mkdir -p ~/Library/Application\ Support/Type4Me/Models # Option A: Lightweight ~20MB tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/ # Option B: Balanced ~236MB (recommended) tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/ # Option C: Bilingual Chinese+English ~1GB tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \ -C ~/Library/Application\ Support/Type4Me/Models/ ``` Expected structure for Paraformer model: ``` ~/Library/Application Support/Type4Me/Models/ └── sherpa-onnx-streaming-paraformer-bilingual-zh-en/ ├── encoder.int8.onnx ├── decoder.int8.onnx └── tokens.txt ``` ## Key Protocols ### SpeechRecognizer Protocol Every ASR client must implement this protocol: ```swift protocol SpeechRecognizer: AnyObject { /// Start a new recognition session func startRecognition() async throws /// Feed raw PCM audio data func appendAudio(_ buffer: AVAudioPCMBuffer) async /// Stop and get final result func stopRecognition() async throws -> String /// Cancel without result func cancelRecognition() async /// Streaming partial results (optional) var partialResultHandler: ((String