
Wiki Dedup
Find and merge Obsidian wiki pages that describe the same concept under different titles before your knowledge base drifts into duplicates.
Install
npx skills add https://github.com/ar9av/obsidian-wiki --skill wiki-dedupWhat is this skill?
- Scans vault inventory via index.md for page-level identity collisions (e.g. RSC vs React Server Components)
- Cheap candidate pass uses frontmatter and titles; full bodies open only for confirmed pairs per llm-wiki retrieval rules
- Destructive merge mode requires explicit confirmation—merges are not auto-undoable
- Resolves Obsidian config from CWD .env, ~/.obsidian-wiki/config, or setup prompt
- Distinct from wiki-lint (structure) and cross-linker (links)—this skill deletes/consolidates pages
Adoption & trust: 666 installs on skills.sh; 1.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Lark Doclarksuite/cli
Lark Wikilarksuite/cli
Opensource Guide Coachxixu-me/skills
Readme I18nxixu-me/skills
Doc Coauthoringanthropics/skills
Obsidian Markdownkepano/obsidian-skills
Journey fit
Primary fit
Canonical shelf is Build → docs because dedup maintains the structured knowledge base solo builders curate while shipping features and agent skills. Wiki identity resolution is documentation hygiene—collapsing alias pages so cross-links and retrieval stay trustworthy.
Common Questions / FAQ
Is Wiki Dedup safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Wiki Dedup
# Wiki Dedup — Identity Resolution and Page-Level Deduplication You are finding and merging wiki pages that cover the same concept under different names. This is a write-heavy, potentially destructive skill — page merges cannot be automatically undone. Work carefully and confirm before acting in merge mode. **Follow the Retrieval Primitives table in `llm-wiki/SKILL.md`.** The candidate-detection pass uses only frontmatter and titles (cheap). Only open full page bodies for confirmed candidate pairs. ## Before You Start 1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `OBSIDIAN_LINK_FORMAT`. 2. Read `index.md` to get the full page inventory with one-line descriptions and tags. 3. Read `log.md` briefly — if a dedup run just happened, note what was already merged. ## Modes | Mode | Flag | Behavior | |---|---|---| | **Audit** | *(default)* | Report candidates only — no writes | | **Merge** | `--merge` | Show each confirmed pair, ask for confirmation before merging | | **Auto-merge** | `--auto` | Merge all high-confidence pairs (`score ≥ 0.90`) non-interactively | If the user doesn't specify, run in **Audit** mode and present findings before asking whether to proceed. ## Step 1: Build the Page Registry Glob all `.md` files in the vault (excluding `_archives/`, `_raw/`, `.obsidian/`, `index.md`, `log.md`, `hot.md`, `_insights.md`, and any file that contains `redirects_to:` in its frontmatter — those are already merged redirect stubs). For each remaining page, extract from frontmatter: - `node_id` — relative path from vault root, without `.md` - `title` — frontmatter `title` field - `aliases` — frontmatter `aliases` list (may be absent) - `tags` — frontmatter `tags` list - `category` — directory prefix Build a lookup table: `node_id → {title, aliases, tags, category, summary}`. ## Step 2: Detect Candidate Pairs For every pair of pages in the registry, compute a **similarity score** using these signals: ### 2a. Title similarity signals | Signal | How to assess | Max contribution | |---|---|---| | **Token overlap** | Jaccard similarity of lowercased title word-tokens (split on spaces, hyphens, underscores, punctuation) | 0.65 | | **Edit distance** | Normalized edit distance on lowercased titles: `1 - (edits / max(len_a, len_b))` | 0.40 | | **Substring containment** | One title is a substring of the other (e.g. "RSC" ⊂ "React Server Components") | 0.50 | | **Alias cross-match** | Page A's title appears in page B's `aliases`, or vice versa | 0.65 | Composite title score = `min(max(token_overlap, edit_distance, substring), 0.65) + alias_cross_bonus`. You don't need exact arithmetic — make a confident judgement about degree of similarity. **Title extraction note:** Some pages use YAML block scalars (`title: >-` or `title: |`). When the `title:` value is `>-`, `>`, `|`, or `|-`, the actual title is on the next indented line — read it from there. Never compare the literal string `>-` as a title. ### 2b. Semantic signals (cheap pass) | Signal | Points | |---|---| | Same `category` directory | +0.10 | | Tag overlap ≥ 3 shared tags | +0.15 | | Tag overlap ≥ 2 shared tags | +0.05 | | Same first tag (dominant tag) | +0.05 | ### 2c. Threshold Flag pairs with composite score ≥ **0