
Omnivoice Tts
Install this when you need zero-shot multilingual speech, voice cloning from reference audio, or text-defined voice design inside a Python product pipeline.
Overview
OmniVoice TTS is an agent skill for the Build phase that guides zero-shot multilingual text-to-speech, voice cloning, voice design, and Python inference setup for OmniVoice.
Install
npx skills add https://github.com/aradotso/trending-skills --skill omnivoice-ttsWhat is this skill?
- Zero-shot TTS with voice cloning from reference audio and voice design via text attributes
- 600+ languages on a diffusion language-model-style architecture with auto voice generation
- Documented RTF as low as 0.025 for efficient inference when hardware matches requirements
- Install paths: pip (PyTorch 2.8+ platform wheels), source/git install, and uv-friendly clone flows
- Triggers cover batch inference, installation, and multilingual Python TTS workflows
- Python 3.9+
Adoption & trust: 566 installs on skills.sh; 31 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need natural speech in many languages or cloned voices in your app but lack a reliable local TTS setup and OmniVoice usage pattern.
Who is it for?
Builders shipping voice-enabled agents, narrated content tools, or multilingual demos who can run PyTorch locally or on GPU.
Skip if: Teams that want a fully managed cloud TTS with no Python/PyTorch ops, or products that only need a single static voice via a simple HTTP API.
When should I use this skill?
Clone a voice with OmniVoice, multilingual TTS in Python, voice design, batch inference, or OmniVoice installation and usage.
What do I get? / Deliverables
You get a working OmniVoice install, inference commands, and cloning/design flows so your pipeline outputs speech audio ready to embed in products or content.
- Installed OmniVoice environment
- Generated speech audio from text
- Voice clone or designed-voice inference scripts
Recommended Skills
Journey fit
Voice and TTS integration happens while you are building product features (apps, agents, content tools), not during launch SEO or operate monitoring. OmniVoice is an external model/SDK integration—install PyTorch, wire inference, batch jobs—rather than frontend layout or PM docs.
How it compares
Local multimodal TTS integration skill—not a hosted ElevenLabs-style API wrapper or a generic ffmpeg audio-editing checklist.
Common Questions / FAQ
Who is omnivoice-tts for?
Solo developers and indie teams adding OmniVoice-powered speech, cloning, or voice design in Python backends, agents, or content generators.
When should I use omnivoice-tts?
During Build → integrations when implementing TTS features, prototyping multilingual voices, or scripting batch inference jobs before ship.
Is omnivoice-tts safe to install?
It drives pip/git installs and GPU binaries; review the Security Audits panel on this page, pin dependencies, and audit downloaded model weights and scripts before production use.
SKILL.md
READMESKILL.md - Omnivoice Tts
# OmniVoice TTS Skill > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. OmniVoice is a state-of-the-art zero-shot TTS model supporting 600+ languages, built on a diffusion language model-style architecture. It supports voice cloning (from reference audio), voice design (via text attributes), and auto voice generation with RTF as low as 0.025. --- ## Installation ### Requirements - Python 3.9+ - PyTorch 2.8+ - CUDA (recommended) or Apple Silicon (MPS) or CPU ### pip (recommended) ```bash # Step 1: Install PyTorch for your platform # NVIDIA GPU (CUDA 12.8) pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --extra-index-url https://download.pytorch.org/whl/cu128 # Apple Silicon pip install torch==2.8.0 torchaudio==2.8.0 # Step 2: Install OmniVoice pip install omnivoice # Or from source (latest) pip install git+https://github.com/k2-fsa/OmniVoice.git # Or editable dev install git clone https://github.com/k2-fsa/OmniVoice.git cd OmniVoice pip install -e . ``` ### uv ```bash git clone https://github.com/k2-fsa/OmniVoice.git cd OmniVoice uv sync # With mirror: uv sync --default-index "https://mirrors.aliyun.com/pypi/simple" ``` ### HuggingFace Mirror (if blocked) ```bash export HF_ENDPOINT="https://hf-mirror.com" ``` --- ## Core Concepts | Mode | What you provide | Use case | |---|---|---| | **Voice Cloning** | `ref_audio` + `ref_text` | Clone a speaker from a short audio clip | | **Voice Design** | `instruct` string | Describe speaker attributes (no audio needed) | | **Auto Voice** | nothing extra | Model picks a random voice | --- ## Python API ### Load the Model ```python from omnivoice import OmniVoice import torch import torchaudio # NVIDIA GPU model = OmniVoice.from_pretrained( "k2-fsa/OmniVoice", device_map="cuda:0", dtype=torch.float16 ) # Apple Silicon model = OmniVoice.from_pretrained( "k2-fsa/OmniVoice", device_map="mps", dtype=torch.float16 ) # CPU (slower) model = OmniVoice.from_pretrained( "k2-fsa/OmniVoice", device_map="cpu", dtype=torch.float32 ) ``` ### Voice Cloning ```python # With manual reference transcription (faster, more accurate) audio = model.generate( text="Hello, this is a test of zero-shot voice cloning.", ref_audio="ref.wav", ref_text="Transcription of the reference audio.", ) # Without ref_text — Whisper auto-transcribes ref_audio audio = model.generate( text="Hello, this is a test of zero-shot voice cloning.", ref_audio="ref.wav", ) # audio is a list of torch.Tensor, shape (1, T) at 24kHz torchaudio.save("out.wav", audio[0], 24000) ``` ### Voice Design ```python # Describe speaker via comma-separated attributes audio = model.generate( text="Hello, this is a test of zero-shot voice design.", instruct="female, low pitch, british accent", ) torchaudio.save("out.wav", audio[0], 24000) ``` **Supported attributes:** - **Gender**: `male`, `female` - **Age**: `child`, `young`, `middle-aged`, `elderly` - **Pitch**: `very low pitch`, `low pitch`, `high pitch`, `very high pitch` - **Style**: `whisper` - **English accents**: `american accent`, `british accent`, `australian accent`, etc. - **Chinese dialects**: `四川话`, `陕西话`, etc. ### Auto Voice ```python audio = model.generate(text="This is a sentence without any voice prompt.") torchaudio.save("out.wav", audio[0], 24000) ``` ### Generation Parameters ```python audio = model.generate( text="Hello world.", ref_audio="ref.wav", ref_text="Reference text.", num_step=32,