Podcast Generation

Name: Podcast Generation
Author: bytedance

bytedance/deer-flow

Turn structured podcast scripts into stitched audio using Volcengine TTS with parallel line synthesis for solo builders shipping audio content from agent workflows.

Overview

Podcast Generation is an agent skill for the Grow phase that converts JSON podcast scripts into Volcengine TTS audio with parallel per-line synthesis.

Install

npx skills add https://github.com/bytedance/deer-flow --skill podcast-generation

What is this skill?

Script model with locale en or zh and lines keyed by male or female speaker roles
Volcengine TTS integration via VOLCENGINE_TTS_APPID, ACCESS_TOKEN, and optional CLUSTER
Concurrent ThreadPoolExecutor processing for multiple script lines
text_to_speech helper returns audio bytes per paragraph for downstream muxing
Deer-flow skill package for automated spoken-content pipelines
Supports script locales en and zh
Script lines use two speaker roles: male and female

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.4k installs on skills.sh; 70.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have a structured multi-speaker script but no fast way to generate locale-aware podcast audio inside an agent pipeline.

Who is it for?

Builders already using Deer-flow or Volcengine TTS who batch-generate en or zh dialogue from agent-produced scripts.

Skip if: Teams that need human voice talent, non-Volcengine providers only, or podcast hosting and RSS distribution without a separate distribution skill.

When should I use this skill?

You have a podcast Script (locale and lines with speaker and paragraph) and Volcengine TTS environment variables configured for synthesis.

What do I get? / Deliverables

Each script line is synthesized to audio bytes via Volcengine TTS so you can mux or publish an episode without manual voice recording.

Per-line TTS audio byte buffers from Volcengine
Runnable podcast synthesis script for Deer-flow agent invocation

Recommended Skills

Video Editagentspace-so/runcomfy-agent-skills

Video Edit is a RunComfy-focused agent skill that acts as a smart router between your edit intent and the correct model …211k installs·15 stars

Image To Videoagentspace-so/runcomfy-agent-skills

Image-to-Video on RunComfy picks the right i2v model for each intent—HappyHorse for general animation, Wan 2.7 with audi…210k installs·15 stars

Image Editagentspace-so/runcomfy-agent-skills

Image Edit is a RunComfy Pro Pack agent skill that acts as a smart router between your edit intent and the right model i…210k installs·15 stars

Flux Kontextagentspace-so/runcomfy-agent-skills

Flux Kontext Pro on RunComfy packages Black Forest Labs' precise local edit model with documented prompting patterns and…210k installs·15 stars

Nano Banana 2agentspace-so/runcomfy-agent-skills

Nano Banana 2 on RunComfy wraps Google's Gemini-family flash text-to-image model with prompting patterns for fast iterat…210k installs·15 stars

Nano Banana Editagentspace-so/runcomfy-agent-skills

Nano Banana Edit on RunComfy documents Google's image-to-image edit endpoint for identity-preserving changes, background…210k installs·15 stars

Journey fit

Primary fit

GrowContent & marketing

Podcast audio production compounds audience reach after the product exists; it belongs on the Grow shelf under content, not initial Build or Ship hardening. The implementation centers on script lines, locale, and TTS rendering—core content production rather than distribution plumbing or analytics.

Also useful

LaunchDistribution & launch channels

How it compares

This is a script-to-audio generator inside an agent repo, not a podcast host, editor UI, or MCP media server by itself.

Common Questions / FAQ

Who is podcast-generation for?

Solo builders and small content teams automating podcast production with Deer-flow, Python, and Volcengine TTS credentials.

When should I use podcast-generation?

During Grow content work when you have a Script JSON (speakers and paragraphs) and want parallel TTS before editing or publishing the episode.

Is podcast-generation safe to install?

It uses network access and Volcengine secrets from the environment; review the Security Audits panel on this Prism page and rotate TTS tokens if the repo is shared.

SKILL.md

READMESKILL.md - Podcast Generation

import argparse
import base64
import json
import logging
import os
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Literal, Optional

import requests

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


# Types
class ScriptLine:
    def __init__(self, speaker: Literal["male", "female"] = "male", paragraph: str = ""):
        self.speaker = speaker
        self.paragraph = paragraph


class Script:
    def __init__(self, locale: Literal["en", "zh"] = "en", lines: Optional[list[ScriptLine]] = None):
        self.locale = locale
        self.lines = lines or []

    @classmethod
    def from_dict(cls, data: dict) -> "Script":
        script = cls(locale=data.get("locale", "en"))
        for line in data.get("lines", []):
            script.lines.append(
                ScriptLine(
                    speaker=line.get("speaker", "male"),
                    paragraph=line.get("paragraph", ""),
                )
            )
        return script


def text_to_speech(text: str, voice_type: str) -> Optional[bytes]:
    """Convert text to speech using Volcengine TTS."""
    app_id = os.getenv("VOLCENGINE_TTS_APPID")
    access_token = os.getenv("VOLCENGINE_TTS_ACCESS_TOKEN")
    cluster = os.getenv("VOLCENGINE_TTS_CLUSTER", "volcano_tts")

    if not app_id or not access_token:
        raise ValueError(
            "VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN environment variables must be set"
        )

    url = "https://openspeech.bytedance.com/api/v1/tts"

    # Authentication: Bearer token with semicolon separator
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer;{access_token}",
    }

    payload = {
        "app": {
            "appid": app_id,
            "token": "access_token",  # literal string, not the actual token
            "cluster": cluster,
        },
        "user": {"uid": "podcast-generator"},
        "audio": {
            "voice_type": voice_type,
            "encoding": "mp3",
            "speed_ratio": 1.2,
        },
        "request": {
            "reqid": str(uuid.uuid4()),  # must be unique UUID
            "text": text,
            "text_type": "plain",
            "operation": "query",
        },
    }

    try:
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code != 200:
            logger.error(f"TTS API error: {response.status_code} - {response.text}")
            return None

        result = response.json()
        if result.get("code") != 3000:
            logger.error(f"TTS error: {result.get('message')} (code: {result.get('code')})")
            return None

        audio_data = result.get("data")
        if audio_data:
            return base64.b64decode(audio_data)

    except Exception as e:
        logger.error(f"TTS error: {str(e)}")

    return None


def _process_line(args: tuple[int, ScriptLine, int]) -> tuple[int, Optional[bytes]]:
    """Process a single script line for TTS. Returns (index, audio_bytes)."""
    i, line, total = args

    # Select voice based on speaker gender
    if line.speaker == "male":
        voice_type = "zh_male_yangguangqingnian_moon_bigtts"  # Male voice
    else:
        voice_type = "zh_female_sajiaonvyou_moon_bigtts"  # Female voice

    logger.info(f"Processing line {i + 1}/{total} ({line.speaker})")
    audio = text_to_speech(line.paragraph, voice_type)

    if not audio:
        logger.warning(f"Failed to generate audio for line {i + 1}")

    return (i, audio)


def tts_node(script: Script, max_workers: int = 4) -> list[bytes]:
    """Convert script lines to audio chunks using TTS with multi-threading."""
    logger.info(f"Converting script to audio using {max_workers} workers...")

    total = len(script.lines)
    
    # Handle empty script case
    if total == 0:
        raise ValueError("Script contains no lines to process")

    # Validate required environment variables be

What is this skill?

Script model with locale en or zh and lines keyed by male or female speaker roles

Volcengine TTS integration via VOLCENGINE_TTS_APPID, ACCESS_TOKEN, and optional CLUSTER

Concurrent ThreadPoolExecutor processing for multiple script lines

text_to_speech helper returns audio bytes per paragraph for downstream muxing

Deer-flow skill package for automated spoken-content pipelines

Supports script locales en and zh

Script lines use two speaker roles: male and female

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.4k installs on skills.sh; 70.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

GrowContent & marketing

Also useful

LaunchDistribution & launch channels

SKILL.md

READMESKILL.md - Podcast Generation

import argparse
import base64
import json
import logging
import os
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Literal, Optional

import requests

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


# Types
class ScriptLine:
    def __init__(self, speaker: Literal["male", "female"] = "male", paragraph: str = ""):
        self.speaker = speaker
        self.paragraph = paragraph


class Script:
    def __init__(self, locale: Literal["en", "zh"] = "en", lines: Optional[list[ScriptLine]] = None):
        self.locale = locale
        self.lines = lines or []

    @classmethod
    def from_dict(cls, data: dict) -> "Script":
        script = cls(locale=data.get("locale", "en"))
        for line in data.get("lines", []):
            script.lines.append(
                ScriptLine(
                    speaker=line.get("speaker", "male"),
                    paragraph=line.get("paragraph", ""),
                )
            )
        return script


def text_to_speech(text: str, voice_type: str) -> Optional[bytes]:
    """Convert text to speech using Volcengine TTS."""
    app_id = os.getenv("VOLCENGINE_TTS_APPID")
    access_token = os.getenv("VOLCENGINE_TTS_ACCESS_TOKEN")
    cluster = os.getenv("VOLCENGINE_TTS_CLUSTER", "volcano_tts")

    if not app_id or not access_token:
        raise ValueError(
            "VOLCENGINE_TTS_APPID and VOLCENGINE_TTS_ACCESS_TOKEN environment variables must be set"
        )

    url = "https://openspeech.bytedance.com/api/v1/tts"

    # Authentication: Bearer token with semicolon separator
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer;{access_token}",
    }

    payload = {
        "app": {
            "appid": app_id,
            "token": "access_token",  # literal string, not the actual token
            "cluster": cluster,
        },
        "user": {"uid": "podcast-generator"},
        "audio": {
            "voice_type": voice_type,
            "encoding": "mp3",
            "speed_ratio": 1.2,
        },
        "request": {
            "reqid": str(uuid.uuid4()),  # must be unique UUID
            "text": text,
            "text_type": "plain",
            "operation": "query",
        },
    }

    try:
        response = requests.post(url, json=payload, headers=headers)

        if response.status_code != 200:
            logger.error(f"TTS API error: {response.status_code} - {response.text}")
            return None

        result = response.json()
        if result.get("code") != 3000:
            logger.error(f"TTS error: {result.get('message')} (code: {result.get('code')})")
            return None

        audio_data = result.get("data")
        if audio_data:
            return base64.b64decode(audio_data)

    except Exception as e:
        logger.error(f"TTS error: {str(e)}")

    return None


def _process_line(args: tuple[int, ScriptLine, int]) -> tuple[int, Optional[bytes]]:
    """Process a single script line for TTS. Returns (index, audio_bytes)."""
    i, line, total = args

    # Select voice based on speaker gender
    if line.speaker == "male":
        voice_type = "zh_male_yangguangqingnian_moon_bigtts"  # Male voice
    else:
        voice_type = "zh_female_sajiaonvyou_moon_bigtts"  # Female voice

    logger.info(f"Processing line {i + 1}/{total} ({line.speaker})")
    audio = text_to_speech(line.paragraph, voice_type)

    if not audio:
        logger.warning(f"Failed to generate audio for line {i + 1}")

    return (i, audio)


def tts_node(script: Script, max_workers: int = 4) -> list[bytes]:
    """Convert script lines to audio chunks using TTS with multi-threading."""
    logger.info(f"Converting script to audio using {max_workers} workers...")

    total = len(script.lines)
    
    # Handle empty script case
    if total == 0:
        raise ValueError("Script contains no lines to process")

    # Validate required environment variables be

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is podcast-generation for?

When should I use podcast-generation?

Is podcast-generation safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is podcast-generation for?

When should I use podcast-generation?

Is podcast-generation safe to install?

SKILL.md