Translate Book

Name: Translate Book
Author: deusyu

deusyu/translate-book

Run a full book translation pipeline from PDF/DOCX/EPUB through chunked Markdown and parallel sub-agents into translated HTML, DOCX, EPUB, or PDF.

Overview

Translate-book is an agent skill for the Build phase that translates entire books from PDF/DOCX/EPUB via Markdown chunks and parallel sub-agents into translated HTML, DOCX, EPUB, or PDF.

Install

npx skills add https://github.com/deusyu/translate-book --skill translate-book

What is this skill?

End-to-end pipeline: input file → Markdown chunks → parallel translation → multi-format export
Supports PDF, DOCX, and EPUB inputs with target language codes (default zh)
Orchestrates parallel sub-agents with configurable batch concurrency (default 8)
Uses convert.py preprocessing plus pandoc/calibre-style tooling for EPUB and exports
Collects optional cover image, export naming, temp roots, and custom translation instructions
Default parallel sub-agent concurrency per batch: 8
Input formats: PDF, DOCX, EPUB
Pipeline stages: preprocess to Markdown chunks → translate → export HTML/DOCX/EPUB/PDF

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 662 installs on skills.sh; 755 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have a full book file in one language and no practical way to chunk, translate, and re-export it without manual copy-paste and broken formatting.

Who is it for?

Indie authors, course creators, and builders localizing long-form EPUBs or PDFs who already have Calibre/pandoc installed and can supervise agent file writes.

Skip if: Quick paragraph translation in chat, real-time collaborative editing, or teams without shell access and ebook conversion binaries.

When should I use this skill?

The user provides or will provide a book file path and wants a target-language export via the convert.py → chunk → parallel translate → assemble workflow.

What do I get? / Deliverables

You get chunked Markdown intermediates and assembled translated exports in your chosen formats, driven by a scripted preprocess and parallel agent batches.

Markdown chunk directory under configurable temp root
Translated HTML, DOCX, EPUB, and/or PDF with optional custom export stem

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

Canonical shelf is Build/docs because the skill manufactures translated book artifacts and chunked source files, not launch analytics or production monitoring. Docs captures book-length written deliverables and conversion outputs indie builders treat as documentation or publishable manuscripts.

Also useful

LaunchDistribution & launch channels

How it compares

A multi-step book pipeline skill—not a lightweight inline translator or MCP dictionary service.

Common Questions / FAQ

Who is translate-book for?

Solo builders and small teams translating whole books or manuals who want an orchestrated chunk-and-export workflow instead of one-shot LLM paste.

When should I use translate-book?

Use it in Build (docs) when you need localized book artifacts; optionally before Launch (distribution) if translated EPUB/PDF is your go-to-market asset.

Is translate-book safe to install?

It uses Read, Write, Edit, Bash, and Agent capabilities—check the Security Audits panel on this page and review scripts under the skill before running on sensitive manuscripts.

SKILL.md

READMESKILL.md - Translate Book

# Book Translation Skill

You are a book translation assistant. You translate entire books from one language to another by orchestrating a multi-step pipeline.

## Workflow

### 1. Collect Parameters

Determine the following from the user's message:
- **file_path**: Path to the input file (PDF, DOCX, or EPUB) — REQUIRED
- **target_lang**: Target language code (default: `zh`) — e.g. zh, en, ja, ko, fr, de, es
- **concurrency**: Number of parallel sub-agents per batch (default: `8`)
- **temp_root**: Optional directory under which `{filename}_temp/` should be created
- **epub_cover**: Optional explicit cover image path for EPUB output
- **export_name**: Optional filename stem for user-facing output aliases
- **custom_instructions**: Any additional translation instructions from the user (optional)

If the file path is not provided, ask the user.

### 2. Preprocess — Convert to Markdown Chunks

Run the conversion script to produce chunks:

```bash
python3 {baseDir}/scripts/convert.py "<file_path>" --olang "<target_lang>"
```

If the user provided `temp_root`, add `--temp-root "<temp_root>"`. The temp
directory leaf name remains `{filename}_temp/`; only the parent directory
changes.

This creates a `{filename}_temp/` directory containing:
- `input.html`, `input.md` — intermediate files
- `chunk0001.md`, `chunk0002.md`, ... — source chunks for translation
- `manifest.json` — chunk manifest for tracking and validation
- `config.txt` — pipeline configuration with metadata

### 3. Discover Source Chunks

Use Glob to find all source chunks:

```
Glob: {filename}_temp/chunk*.md
```

Exclude `output_chunk*.md` from the source list. The selective re-translation
plan below decides which chunks actually need work.

### 3.5. Build Glossary (term consistency)

A separate sub-agent translates each chunk with a fresh context. Without shared state, the same proper noun can drift across multiple translations. The glossary makes every sub-agent see the same canonical translation for the terms that appear in its chunk.

If `<temp_dir>/glossary.json` already exists, skip the rebuild — re-running the skill must not overwrite a hand-edited glossary. To force a rebuild, delete the file.

Otherwise:

1. **Sample chunks**: read `chunk0001.md`, the last chunk, and 3 evenly-spaced middle chunks. If `chunk_count < 5`, sample all of them.
2. **Extract terms**: from the samples, identify proper nouns and recurring domain terms that need consistent translation across the book — typically people, places, organizations, technical concepts. Translate each into the target language. Skip generic vocabulary that any translator would render the same way.
3. **Write `glossary.json`** in the temp dir, matching this v2 schema:

   ```json
   {
     "version": 2,
     "terms": [
       {"id": "Manhattan", "source": "Manhattan", "target": "曼哈顿",
        "category": "place", "aliases": [], "gender": "unknown",
        "confidence": "medium", "frequency": 0,
        "evidence_refs": [], "notes": ""}
     ],
     "high_frequency_top_n": 20,
     "applied_meta_hashes": {}
   }
   ```

   Existing v1 `glossary.json` files are auto-upgraded to v2 on first load. v2 forbids the same surface form (source or alias) appearing in two different terms; if a v1 file has polysemous duplicate sources, the upgrade aborts with a disambiguation message.

4. **Count frequencies** by running:

   ```bash
   python3 {baseDir}/scripts/glossary.py count-frequencies "<temp_dir>"
   ```

   This scans every `chunk*.md` (excluding `output_chunk*.md`), updates eac

What is this skill?

End-to-end pipeline: input file → Markdown chunks → parallel translation → multi-format export

Supports PDF, DOCX, and EPUB inputs with target language codes (default zh)

Orchestrates parallel sub-agents with configurable batch concurrency (default 8)

Uses convert.py preprocessing plus pandoc/calibre-style tooling for EPUB and exports

Collects optional cover image, export naming, temp roots, and custom translation instructions

Default parallel sub-agent concurrency per batch: 8

Input formats: PDF, DOCX, EPUB

Pipeline stages: preprocess to Markdown chunks → translate → export HTML/DOCX/EPUB/PDF

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 662 installs on skills.sh; 755 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

Also useful

LaunchDistribution & launch channels

SKILL.md

READMESKILL.md - Translate Book

# Book Translation Skill

You are a book translation assistant. You translate entire books from one language to another by orchestrating a multi-step pipeline.

## Workflow

### 1. Collect Parameters

Determine the following from the user's message:
- **file_path**: Path to the input file (PDF, DOCX, or EPUB) — REQUIRED
- **target_lang**: Target language code (default: `zh`) — e.g. zh, en, ja, ko, fr, de, es
- **concurrency**: Number of parallel sub-agents per batch (default: `8`)
- **temp_root**: Optional directory under which `{filename}_temp/` should be created
- **epub_cover**: Optional explicit cover image path for EPUB output
- **export_name**: Optional filename stem for user-facing output aliases
- **custom_instructions**: Any additional translation instructions from the user (optional)

If the file path is not provided, ask the user.

### 2. Preprocess — Convert to Markdown Chunks

Run the conversion script to produce chunks:

```bash
python3 {baseDir}/scripts/convert.py "<file_path>" --olang "<target_lang>"
```

If the user provided `temp_root`, add `--temp-root "<temp_root>"`. The temp
directory leaf name remains `{filename}_temp/`; only the parent directory
changes.

This creates a `{filename}_temp/` directory containing:
- `input.html`, `input.md` — intermediate files
- `chunk0001.md`, `chunk0002.md`, ... — source chunks for translation
- `manifest.json` — chunk manifest for tracking and validation
- `config.txt` — pipeline configuration with metadata

### 3. Discover Source Chunks

Use Glob to find all source chunks:

```
Glob: {filename}_temp/chunk*.md
```

Exclude `output_chunk*.md` from the source list. The selective re-translation
plan below decides which chunks actually need work.

### 3.5. Build Glossary (term consistency)

A separate sub-agent translates each chunk with a fresh context. Without shared state, the same proper noun can drift across multiple translations. The glossary makes every sub-agent see the same canonical translation for the terms that appear in its chunk.

If `<temp_dir>/glossary.json` already exists, skip the rebuild — re-running the skill must not overwrite a hand-edited glossary. To force a rebuild, delete the file.

Otherwise:

1. **Sample chunks**: read `chunk0001.md`, the last chunk, and 3 evenly-spaced middle chunks. If `chunk_count < 5`, sample all of them.
2. **Extract terms**: from the samples, identify proper nouns and recurring domain terms that need consistent translation across the book — typically people, places, organizations, technical concepts. Translate each into the target language. Skip generic vocabulary that any translator would render the same way.
3. **Write `glossary.json`** in the temp dir, matching this v2 schema:

   ```json
   {
     "version": 2,
     "terms": [
       {"id": "Manhattan", "source": "Manhattan", "target": "曼哈顿",
        "category": "place", "aliases": [], "gender": "unknown",
        "confidence": "medium", "frequency": 0,
        "evidence_refs": [], "notes": ""}
     ],
     "high_frequency_top_n": 20,
     "applied_meta_hashes": {}
   }
   ```

   Existing v1 `glossary.json` files are auto-upgraded to v2 on first load. v2 forbids the same surface form (source or alias) appearing in two different terms; if a v1 file has polysemous duplicate sources, the upgrade aborts with a disambiguation message.

4. **Count frequencies** by running:

   ```bash
   python3 {baseDir}/scripts/glossary.py count-frequencies "<temp_dir>"
   ```

   This scans every `chunk*.md` (excluding `output_chunk*.md`), updates eac

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is translate-book for?

When should I use translate-book?

Is translate-book safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is translate-book for?

When should I use translate-book?

Is translate-book safe to install?

SKILL.md