
Podcast Generation
Turn structured podcast scripts into stitched audio using Volcengine TTS with parallel line synthesis for solo builders shipping audio content from agent workflows.
Overview
Podcast Generation is an agent skill for the Grow phase that converts JSON podcast scripts into Volcengine TTS audio with parallel per-line synthesis.
Install
npx skills add https://github.com/bytedance/deer-flow --skill podcast-generationWhat is this skill?
- Script model with locale en or zh and lines keyed by male or female speaker roles
- Volcengine TTS integration via VOLCENGINE_TTS_APPID, ACCESS_TOKEN, and optional CLUSTER
- Concurrent ThreadPoolExecutor processing for multiple script lines
- text_to_speech helper returns audio bytes per paragraph for downstream muxing
- Deer-flow skill package for automated spoken-content pipelines
- Supports script locales en and zh
- Script lines use two speaker roles: male and female
Adoption & trust: 1.4k installs on skills.sh; 70.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a structured multi-speaker script but no fast way to generate locale-aware podcast audio inside an agent pipeline.
Who is it for?
Builders already using Deer-flow or Volcengine TTS who batch-generate en or zh dialogue from agent-produced scripts.
Skip if: Teams that need human voice talent, non-Volcengine providers only, or podcast hosting and RSS distribution without a separate distribution skill.
When should I use this skill?
You have a podcast Script (locale and lines with speaker and paragraph) and Volcengine TTS environment variables configured for synthesis.
What do I get? / Deliverables
Each script line is synthesized to audio bytes via Volcengine TTS so you can mux or publish an episode without manual voice recording.
- Per-line TTS audio byte buffers from Volcengine
- Runnable podcast synthesis script for Deer-flow agent invocation
Recommended Skills
Journey fit
Podcast audio production compounds audience reach after the product exists; it belongs on the Grow shelf under content, not initial Build or Ship hardening. The implementation centers on script lines, locale, and TTS rendering—core content production rather than distribution plumbing or analytics.
How it compares
This is a script-to-audio generator inside an agent repo, not a podcast host, editor UI, or MCP media server by itself.
Common Questions / FAQ
Who is podcast-generation for?
Solo builders and small content teams automating podcast production with Deer-flow, Python, and Volcengine TTS credentials.
When should I use podcast-generation?
During Grow content work when you have a Script JSON (speakers and paragraphs) and want parallel TTS before editing or publishing the episode.
Is podcast-generation safe to install?
It uses network access and Volcengine secrets from the environment; review the Security Audits panel on this Prism page and rotate TTS tokens if the repo is shared.
SKILL.md
READMESKILL.md - Podcast Generation
import argparse import base64 import json import logging import os import uuid from concurrent.futures import ThreadPoolExecutor, as_completed from typing import Literal, Optional import requests logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # Types class ScriptLine: def __init__(self, speaker: Literal["male", "female"] = "male", paragraph: str = ""): self.speaker = speaker self.paragraph = paragraph class Script: def __init__(self, locale: Literal["en", "zh"] = "en", lines: Optional[list[ScriptLine]] = None): self.locale = locale self.lines = lines or [] @classmethod def from_dict(cls, data: dict) -> "Script": script = cls(locale=data.get("locale", "en")) for line in data.get("lines", []): script.lines.append( ScriptLine( speaker=line.get("speaker", "male"), paragraph=line.get("paragraph", ""), ) ) return script def text_to_speech(text: str, voice_type: str) -> Optional[bytes]: """Convert text to speech using Volcengine TTS.""" app_id = os.getenv("VOLCENGINE_TTS_APPID") access_token = os.getenv("VOLCENGINE_TTS_ACCESS_TOKEN") cluster = os.getenv("VOLCENGINE_TTS_CLUSTER", "volcano_tts") if not app_id or not access_token: raise ValueError( "VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN environment variables must be set" ) url = "https://openspeech.bytedance.com/api/v1/tts" # Authentication: Bearer token with semicolon separator headers = { "Content-Type": "application/json", "Authorization": f"Bearer;{access_token}", } payload = { "app": { "appid": app_id, "token": "access_token", # literal string, not the actual token "cluster": cluster, }, "user": {"uid": "podcast-generator"}, "audio": { "voice_type": voice_type, "encoding": "mp3", "speed_ratio": 1.2, }, "request": { "reqid": str(uuid.uuid4()), # must be unique UUID "text": text, "text_type": "plain", "operation": "query", }, } try: response = requests.post(url, json=payload, headers=headers) if response.status_code != 200: logger.error(f"TTS API error: {response.status_code} - {response.text}") return None result = response.json() if result.get("code") != 3000: logger.error(f"TTS error: {result.get('message')} (code: {result.get('code')})") return None audio_data = result.get("data") if audio_data: return base64.b64decode(audio_data) except Exception as e: logger.error(f"TTS error: {str(e)}") return None def _process_line(args: tuple[int, ScriptLine, int]) -> tuple[int, Optional[bytes]]: """Process a single script line for TTS. Returns (index, audio_bytes).""" i, line, total = args # Select voice based on speaker gender if line.speaker == "male": voice_type = "zh_male_yangguangqingnian_moon_bigtts" # Male voice else: voice_type = "zh_female_sajiaonvyou_moon_bigtts" # Female voice logger.info(f"Processing line {i + 1}/{total} ({line.speaker})") audio = text_to_speech(line.paragraph, voice_type) if not audio: logger.warning(f"Failed to generate audio for line {i + 1}") return (i, audio) def tts_node(script: Script, max_workers: int = 4) -> list[bytes]: """Convert script lines to audio chunks using TTS with multi-threading.""" logger.info(f"Converting script to audio using {max_workers} workers...") total = len(script.lines) # Handle empty script case if total == 0: raise ValueError("Script contains no lines to process") # Validate required environment variables be