
Translate Book
Run a full book translation pipeline from PDF/DOCX/EPUB through chunked Markdown and parallel sub-agents into translated HTML, DOCX, EPUB, or PDF.
Overview
Translate-book is an agent skill for the Build phase that translates entire books from PDF/DOCX/EPUB via Markdown chunks and parallel sub-agents into translated HTML, DOCX, EPUB, or PDF.
Install
npx skills add https://github.com/deusyu/translate-book --skill translate-bookWhat is this skill?
- End-to-end pipeline: input file → Markdown chunks → parallel translation → multi-format export
- Supports PDF, DOCX, and EPUB inputs with target language codes (default zh)
- Orchestrates parallel sub-agents with configurable batch concurrency (default 8)
- Uses convert.py preprocessing plus pandoc/calibre-style tooling for EPUB and exports
- Collects optional cover image, export naming, temp roots, and custom translation instructions
- Default parallel sub-agent concurrency per batch: 8
- Input formats: PDF, DOCX, EPUB
- Pipeline stages: preprocess to Markdown chunks → translate → export HTML/DOCX/EPUB/PDF
Adoption & trust: 662 installs on skills.sh; 755 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a full book file in one language and no practical way to chunk, translate, and re-export it without manual copy-paste and broken formatting.
Who is it for?
Indie authors, course creators, and builders localizing long-form EPUBs or PDFs who already have Calibre/pandoc installed and can supervise agent file writes.
Skip if: Quick paragraph translation in chat, real-time collaborative editing, or teams without shell access and ebook conversion binaries.
When should I use this skill?
The user provides or will provide a book file path and wants a target-language export via the convert.py → chunk → parallel translate → assemble workflow.
What do I get? / Deliverables
You get chunked Markdown intermediates and assembled translated exports in your chosen formats, driven by a scripted preprocess and parallel agent batches.
- Markdown chunk directory under configurable temp root
- Translated HTML, DOCX, EPUB, and/or PDF with optional custom export stem
Recommended Skills
Journey fit
Canonical shelf is Build/docs because the skill manufactures translated book artifacts and chunked source files, not launch analytics or production monitoring. Docs captures book-length written deliverables and conversion outputs indie builders treat as documentation or publishable manuscripts.
How it compares
A multi-step book pipeline skill—not a lightweight inline translator or MCP dictionary service.
Common Questions / FAQ
Who is translate-book for?
Solo builders and small teams translating whole books or manuals who want an orchestrated chunk-and-export workflow instead of one-shot LLM paste.
When should I use translate-book?
Use it in Build (docs) when you need localized book artifacts; optionally before Launch (distribution) if translated EPUB/PDF is your go-to-market asset.
Is translate-book safe to install?
It uses Read, Write, Edit, Bash, and Agent capabilities—check the Security Audits panel on this page and review scripts under the skill before running on sensitive manuscripts.
SKILL.md
READMESKILL.md - Translate Book
# Book Translation Skill You are a book translation assistant. You translate entire books from one language to another by orchestrating a multi-step pipeline. ## Workflow ### 1. Collect Parameters Determine the following from the user's message: - **file_path**: Path to the input file (PDF, DOCX, or EPUB) — REQUIRED - **target_lang**: Target language code (default: `zh`) — e.g. zh, en, ja, ko, fr, de, es - **concurrency**: Number of parallel sub-agents per batch (default: `8`) - **temp_root**: Optional directory under which `{filename}_temp/` should be created - **epub_cover**: Optional explicit cover image path for EPUB output - **export_name**: Optional filename stem for user-facing output aliases - **custom_instructions**: Any additional translation instructions from the user (optional) If the file path is not provided, ask the user. ### 2. Preprocess — Convert to Markdown Chunks Run the conversion script to produce chunks: ```bash python3 {baseDir}/scripts/convert.py "<file_path>" --olang "<target_lang>" ``` If the user provided `temp_root`, add `--temp-root "<temp_root>"`. The temp directory leaf name remains `{filename}_temp/`; only the parent directory changes. This creates a `{filename}_temp/` directory containing: - `input.html`, `input.md` — intermediate files - `chunk0001.md`, `chunk0002.md`, ... — source chunks for translation - `manifest.json` — chunk manifest for tracking and validation - `config.txt` — pipeline configuration with metadata ### 3. Discover Source Chunks Use Glob to find all source chunks: ``` Glob: {filename}_temp/chunk*.md ``` Exclude `output_chunk*.md` from the source list. The selective re-translation plan below decides which chunks actually need work. ### 3.5. Build Glossary (term consistency) A separate sub-agent translates each chunk with a fresh context. Without shared state, the same proper noun can drift across multiple translations. The glossary makes every sub-agent see the same canonical translation for the terms that appear in its chunk. If `<temp_dir>/glossary.json` already exists, skip the rebuild — re-running the skill must not overwrite a hand-edited glossary. To force a rebuild, delete the file. Otherwise: 1. **Sample chunks**: read `chunk0001.md`, the last chunk, and 3 evenly-spaced middle chunks. If `chunk_count < 5`, sample all of them. 2. **Extract terms**: from the samples, identify proper nouns and recurring domain terms that need consistent translation across the book — typically people, places, organizations, technical concepts. Translate each into the target language. Skip generic vocabulary that any translator would render the same way. 3. **Write `glossary.json`** in the temp dir, matching this v2 schema: ```json { "version": 2, "terms": [ {"id": "Manhattan", "source": "Manhattan", "target": "曼哈顿", "category": "place", "aliases": [], "gender": "unknown", "confidence": "medium", "frequency": 0, "evidence_refs": [], "notes": ""} ], "high_frequency_top_n": 20, "applied_meta_hashes": {} } ``` Existing v1 `glossary.json` files are auto-upgraded to v2 on first load. v2 forbids the same surface form (source or alias) appearing in two different terms; if a v1 file has polysemous duplicate sources, the upgrade aborts with a disambiguation message. 4. **Count frequencies** by running: ```bash python3 {baseDir}/scripts/glossary.py count-frequencies "<temp_dir>" ``` This scans every `chunk*.md` (excluding `output_chunk*.md`), updates eac