Type4me Macos Voice Input

Name: Type4me Macos Voice Input
Author: aradotso

aradotso/trending-skills

830 installs
66 repo stars
Updated July 9, 2026
aradotso/trending-skills

type4me-macos-voice-input is a development skill that helps developers extend, build, and configure the Type4Me macOS voice-input app with local or cloud ASR engines and LLM text optimization.

About

type4me-macos-voice-input is a development skill for engineers customizing Type4Me, a Swift macOS voice-input tool that captures speech through a global hotkey and inserts transcribed text into any application. The skill covers adding ASR providers, implementing the SpeechRecognizer protocol, configuring local recognition with Sherpa, wiring Volcengine cloud speech APIs, and enabling custom prompt modes that run LLM text optimization on transcripts. It also documents building and deploying Type4Me from source and troubleshooting when voice input fails. All storage stays local by default, making the skill relevant when you need privacy-preserving dictation or a branded voice layer inside internal macOS tooling. Reach for it when extending Type4Me with a new cloud ASR service, tuning local offline models, or debugging hotkey capture pipelines—not when building iOS or cross-platform mobile dictation from scratch without the Type4Me codebase.

Supports local ASR engines (SherpaOnnx, Paraformer, Zipformer) and cloud providers (Volcengine, Deepgram)
LLM-powered text optimization after transcription
Fully local credential and history storage with zero telemetry
Global hotkey audio capture that injects text into any foreground app
Plugin-style provider registry for adding new ASR services

Type4me Macos Voice Input by the numbers

830 all-time installs (skills.sh)
+7 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #1,266 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: CRITICAL risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/aradotso/trending-skills --skill type4me-macos-voice-input

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/aradotso/trending-skills/type4me-macos-voice-input.svg)](https://skillselion.com/skills/aradotso/trending-skills/type4me-macos-voice-input)

Installs	830
repo stars	★ 66
Security audit	1 / 3 scanners passed
Last updated	July 9, 2026
Repository	aradotso/trending-skills ↗

How do you add a new ASR provider to Type4Me on macOS?

Add reliable voice-to-text input with local or cloud ASR engines directly into any macOS application via a global hotkey.

Who is it for?

Swift macOS developers extending Type4Me with local Sherpa or cloud ASR engines and LLM transcript optimization.

Skip if: Windows or Linux dictation apps, iOS-only voice projects, or teams not using the Type4Me codebase.

When should I use this skill?

The user adds an ASR provider to Type4Me, configures Sherpa or Volcengine speech, builds Type4Me from source, or debugs macOS voice input.

What you get

Configured Type4Me build, SpeechRecognizer provider implementation, and working global-hotkey voice-to-text insertion.

ASR provider implementation
Configured Type4Me build

Files

SKILL.mdMarkdownGitHub ↗

Type4Me macOS Voice Input

Skill by ara.so — Daily 2026 Skills collection.

Type4Me is a macOS voice input tool that captures audio via global hotkey, transcribes it using local (SherpaOnnx/Paraformer/Zipformer) or cloud (Volcengine/Deepgram) ASR engines, optionally post-processes text via LLM, and injects the result into any app. All credentials and history are stored locally — no telemetry, no cloud sync.

Architecture Overview

Type4Me/
├── ASR/                    # ASR engine abstraction
│   ├── ASRProvider.swift          # Provider enum + protocols
│   ├── ASRProviderRegistry.swift  # Plugin registry
│   ├── Providers/                 # Per-vendor config files
│   ├── SherpaASRClient.swift      # Local streaming ASR
│   ├── SherpaOfflineASRClient.swift
│   ├── VolcASRClient.swift        # Volcengine streaming ASR
│   └── DeepgramASRClient.swift    # Deepgram streaming ASR
├── Bridge/                 # SherpaOnnx C API Swift bridge
├── Audio/                  # Audio capture
├── Session/                # Core state machine: record→ASR→inject
├── Input/                  # Global hotkey management
├── Services/               # Credentials, hotwords, model manager
├── Protocol/               # Volcengine WebSocket codec
└── UI/                     # SwiftUI (FloatingBar + Settings)

Installation

Prerequisites

# Xcode Command Line Tools
xcode-select --install

# CMake (for local ASR engine)
brew install cmake

Build & Deploy from Source

git clone https://github.com/joewongjc/type4me.git
cd type4me

# Step 1: Compile SherpaOnnx local engine (~5 min, one-time)
bash scripts/build-sherpa.sh

# Step 2: Build, bundle, sign, install to /Applications, and launch
bash scripts/deploy.sh

Download Pre-built App

Download Type4Me-v1.2.3.dmg from releases (cloud ASR only, no local engine):

https://github.com/joewongjc/type4me/releases/tag/v1.2.3

If macOS blocks the app:

xattr -d com.apple.quarantine /Applications/Type4Me.app

Download Local ASR Models

mkdir -p ~/Library/Application\ Support/Type4Me/Models

# Option A: Lightweight ~20MB
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01.tar.bz2 \
    -C ~/Library/Application\ Support/Type4Me/Models/

# Option B: Balanced ~236MB (recommended)
tar xjf ~/Downloads/sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13.tar.bz2 \
    -C ~/Library/Application\ Support/Type4Me/Models/

# Option C: Bilingual Chinese+English ~1GB
tar xjf ~/Downloads/sherpa-onnx-streaming-paraformer-bilingual-zh-en.tar.bz2 \
    -C ~/Library/Application\ Support/Type4Me/Models/

Expected structure for Paraformer model:

~/Library/Application Support/Type4Me/Models/
└── sherpa-onnx-streaming-paraformer-bilingual-zh-en/
    ├── encoder.int8.onnx
    ├── decoder.int8.onnx
    └── tokens.txt

Key Protocols

SpeechRecognizer Protocol

Every ASR client must implement this protocol:

protocol SpeechRecognizer: AnyObject {
    /// Start a new recognition session
    func startRecognition() async throws
    
    /// Feed raw PCM audio data
    func appendAudio(_ buffer: AVAudioPCMBuffer) async
    
    /// Stop and get final result
    func stopRecognition() async throws -> String
    
    /// Cancel without result
    func cancelRecognition() async
    
    /// Streaming partial results (optional)
    var partialResultHandler: ((String) -> Void)? { get set }
}

ASRProviderConfig Protocol

Each vendor's credential definition:

protocol ASRProviderConfig {
    /// Unique identifier string
    static var providerID: String { get }
    
    /// Display name in Settings UI
    static var displayName: String { get }
    
    /// Credential fields shown in Settings
    static var credentialFields: [CredentialField] { get }
    
    /// Validate credentials before use
    static func validate(_ credentials: [String: String]) -> Bool
    
    /// Create the recognizer instance
    static func createClient(
        credentials: [String: String],
        config: RecognitionConfig
    ) throws -> SpeechRecognizer
}

Adding a New ASR Provider

Step 1: Create Provider Config

Create Type4Me/ASR/Providers/OpenAIWhisperProvider.swift:

import Foundation

struct OpenAIWhisperProvider: ASRProviderConfig {
    static let providerID = "openai_whisper"
    static let displayName = "OpenAI Whisper"
    
    static let credentialFields: [CredentialField] = [
        CredentialField(
            key: "api_key",
            label: "API Key",
            placeholder: "sk-...",
            isSecret: true
        ),
        CredentialField(
            key: "model",
            label: "Model",
            placeholder: "whisper-1",
            isSecret: false
        )
    ]
    
    static func validate(_ credentials: [String: String]) -> Bool {
        guard let apiKey = credentials["api_key"], !apiKey.isEmpty else {
            return false
        }
        return apiKey.hasPrefix("sk-")
    }
    
    static func createClient(
        credentials: [String: String],
        config: RecognitionConfig
    ) throws -> SpeechRecognizer {
        guard let apiKey = credentials["api_key"] else {
            throw ASRError.missingCredential("api_key")
        }
        let model = credentials["model"] ?? "whisper-1"
        return OpenAIWhisperASRClient(apiKey: apiKey, model: model, config: config)
    }
}

Step 2: Implement the ASR Client

Create Type4Me/ASR/OpenAIWhisperASRClient.swift:

import Foundation
import AVFoundation

final class OpenAIWhisperASRClient: SpeechRecognizer {
    var partialResultHandler: ((String) -> Void)?
    
    private let apiKey: String
    private let model: String
    private let config: RecognitionConfig
    private var audioData: Data = Data()
    
    init(apiKey: String, model: String, config: RecognitionConfig) {
        self.apiKey = apiKey
        self.model = model
        self.config = config
    }
    
    func startRecognition() async throws {
        audioData = Data()
    }
    
    func appendAudio(_ buffer: AVAudioPCMBuffer) async {
        // Convert PCM buffer to raw bytes and accumulate
        guard let channelData = buffer.floatChannelData?[0] else { return }
        let frameCount = Int(buffer.frameLength)
        let bytes = UnsafeBufferPointer(start: channelData, count: frameCount)
        // Convert Float32 PCM to Int16 for Whisper API
        let int16Samples = bytes.map { sample -> Int16 in
            return Int16(max(-32768, min(32767, Int(sample * 32767))))
        }
        int16Samples.withUnsafeBytes { ptr in
            audioData.append(contentsOf: ptr)
        }
    }
    
    func stopRecognition() async throws -> String {
        // Build multipart form request to Whisper API
        var request = URLRequest(url: URL(string: "https://api.openai.com/v1/audio/transcriptions")!)
        request.httpMethod = "POST"
        request.setValue("Bearer \(apiKey)", forHTTPHeaderField: "Authorization")
        
        let boundary = UUID().uuidString
        request.setValue("multipart/form-data; boundary=\(boundary)", 
                        forHTTPHeaderField: "Content-Type")
        
        var body = Data()
        // Append audio file part
        body.append("--\(boundary)\r\n".data(using: .utf8)!)
        body.append("Content-Disposition: form-data; name=\"file\"; filename=\"audio.raw\"\r\n".data(using: .utf8)!)
        body.append("Content-Type: audio/raw\r\n\r\n".data(using: .utf8)!)
        body.append(audioData)
        body.append("\r\n".data(using: .utf8)!)
        // Append model part
        body.append("--\(boundary)\r\n".data(using: .utf8)!)
        body.append("Content-Disposition: form-data; name=\"model\"\r\n\r\n".data(using: .utf8)!)
        body.append("\(model)\r\n".data(using: .utf8)!)
        body.append("--\(boundary)--\r\n".data(using: .utf8)!)
        
        request.httpBody = body
        
        let (data, response) = try await URLSession.shared.data(for: request)
        guard let httpResponse = response as? HTTPURLResponse,
              httpResponse.statusCode == 200 else {
            throw ASRError.networkError("Whisper API returned error")
        }
        
        let result = try JSONDecoder().decode(WhisperResponse.self, from: data)
        return result.text
    }
    
    func cancelRecognition() async {
        audioData = Data()
    }
}

private struct WhisperResponse: Codable {
    let text: String
}

Step 3: Register the Provider

In Type4Me/ASR/ASRProviderRegistry.swift, add to the all array:

struct ASRProviderRegistry {
    static let all: [any ASRProviderConfig.Type] = [
        SherpaParaformerProvider.self,
        VolcengineProvider.self,
        DeepgramProvider.self,
        OpenAIWhisperProvider.self,   // ← Add your provider here
    ]
}

Credentials Storage

Credentials are stored at ~/Library/Application Support/Type4Me/credentials.json with permissions 0600. Never hardcode secrets — always load via CredentialStore:

// Reading credentials
let store = CredentialStore.shared
let apiKey = store.get(providerID: "openai_whisper", key: "api_key")

// Writing credentials  
store.set(providerID: "openai_whisper", key: "api_key", value: userInputKey)

// Checking if configured
let isConfigured = store.isConfigured(providerID: "openai_whisper", 
                                       fields: OpenAIWhisperProvider.credentialFields)

Custom Processing Modes with Prompt Variables

Processing modes use LLM post-processing with three context variables:

Variable	Value
`{text}`	Recognized speech text
`{selected}`	Text selected in active app at record start
`{clipboard}`	Clipboard content at record start

Example custom mode prompts:

// Translate selection using voice command
let translatePrompt = """
The user selected this text: {selected}
Voice command: {text}
Execute the command on the selected text. Output only the result.
"""

// Code review via voice
let codeReviewPrompt = """
Code to review:
{clipboard}

Review instruction: {text}

Provide focused feedback addressing the instruction.
"""

// Email reply drafting
let emailPrompt = """
Original email: {selected}
My reply intent (spoken): {text}
Write a professional email reply. Output only the email body.
"""

Built-in Processing Modes

enum ProcessingMode {
    case fast           // Direct ASR output, zero latency
    case performance    // Dual-channel: streaming + offline refinement
    case englishTranslation  // Chinese speech → English text
    case promptOptimize // Raw prompt → optimized prompt via LLM
    case command        // Voice command + selected/clipboard context → LLM action
    case custom(prompt: String)  // User-defined prompt template
}

Session State Machine

The core recording flow in Session/:

[Idle]
  → hotkey pressed → [Recording] → audio streams to ASR client
  → hotkey released/pressed again → [Processing]
  → ASR returns text → [LLM Post-processing] (if mode requires)
  → [Injecting] → text injected into active app
  → [Idle]

Updating After Source Changes

cd type4me
git pull
bash scripts/deploy.sh
# SherpaOnnx does NOT need recompiling unless engine version changed

Troubleshooting

App won't open (security warning)

xattr -d com.apple.quarantine /Applications/Type4Me.app

Local model not recognized in Settings

Verify the directory structure exactly matches:

ls ~/Library/Application\ Support/Type4Me/Models/sherpa-onnx-streaming-paraformer-bilingual-zh-en/
# Must show: encoder.int8.onnx  decoder.int8.onnx  tokens.txt

SherpaOnnx build fails

# Ensure cmake is installed
brew install cmake
# Clean and retry
rm -rf Frameworks/
bash scripts/build-sherpa.sh

New ASR provider not appearing in Settings

Confirm the provider type is added to ASRProviderRegistry.all
Ensure providerID is unique across all providers
Clean build: swift package clean && bash scripts/deploy.sh

Audio not captured / no floating bar

Grant microphone permission: System Settings → Privacy & Security → Microphone → Type4Me ✓
Grant Accessibility permission for text injection: System Settings → Privacy & Security → Accessibility → Type4Me ✓

Credentials not saving

# Check file exists and has correct permissions
ls -la ~/Library/Application\ Support/Type4Me/credentials.json
# Should show: -rw------- (0600)
# Fix permissions if needed:
chmod 0600 ~/Library/Application\ Support/Type4Me/credentials.json

Export history to CSV

Open Settings → History → select date range → Export CSV. The SQLite database is at:

~/Library/Application\ Support/Type4Me/history.db
# Direct query:
sqlite3 ~/Library/Application\ Support/Type4Me/history.db \
  "SELECT datetime(timestamp,'unixepoch'), text FROM records ORDER BY timestamp DESC LIMIT 20;"

System Requirements

macOS 14.0 (Sonoma) or later
Apple Silicon (M1/M2/M3/M4) recommended for local ASR inference
Xcode Command Line Tools + CMake for source builds
Internet connection only needed for cloud ASR providers

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Pick this over generic macOS speech APIs when you are specifically extending or deploying the Type4Me open-source voice-input app.

FAQ

What does type4me-macos-voice-input help developers build?

type4me-macos-voice-input helps developers extend Type4Me—a Swift macOS app—with new ASR providers, Sherpa local recognition, Volcengine cloud speech, and LLM-powered transcript optimization via global hotkey.

Which ASR engines does the Type4Me skill cover?

The Type4Me skill covers Sherpa for local on-device recognition and Volcengine for cloud ASR, plus patterns for implementing additional providers through the SpeechRecognizer protocol.

Does Type4Me store voice data in the cloud?

Type4Me is built with fully local storage by default; type4me-macos-voice-input documents how to wire local or cloud ASR while keeping deployment and data handling under developer control.

Is Type4me Macos Voice Input safe to install?

skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingagentsautomation