Wiki Dedup

Name: Wiki Dedup
Author: ar9av

ar9av/obsidian-wiki

Find and merge Obsidian wiki pages that describe the same concept under different titles before your knowledge base drifts into duplicates.

Install

npx skills add https://github.com/ar9av/obsidian-wiki --skill wiki-dedup

What is this skill?

Scans vault inventory via index.md for page-level identity collisions (e.g. RSC vs React Server Components)
Cheap candidate pass uses frontmatter and titles; full bodies open only for confirmed pairs per llm-wiki retrieval rules
Destructive merge mode requires explicit confirmation—merges are not auto-undoable
Resolves Obsidian config from CWD .env, ~/.obsidian-wiki/config, or setup prompt
Distinct from wiki-lint (structure) and cross-linker (links)—this skill deletes/consolidates pages

Adoption & trust: 666 installs on skills.sh; 1.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Recommended Skills

Lark Doclarksuite/cli

lark-doc is an agent skill for Feishu cloud documents, knowledge-base wiki pages, and Docx v2 workflows through the `lar…211k installs·13.7k stars

Lark Wikilarksuite/cli

Operates Lark wiki spaces and nodes via lark-cli, emphasizing URL resolution, bot limitations on departments, and safe s…209k installs·13.7k stars

Opensource Guide Coachxixu-me/skills

Open Source Guide Coach distills GitHub's official Open Source Guides into actionable coaching for starting projects, at…200k installs·61 stars

Readme I18nxixu-me/skills

README i18n skill standardizes multilingual README language selectors—placing a canonical README-I18N block after the ti…200k installs·61 stars

Doc Coauthoringanthropics/skills

Doc Co-Authoring is an agent skill that walks solo builders through collaborative creation of substantial documentation—…54.6k installs·148k stars

Obsidian Markdownkepano/obsidian-skills

obsidian-markdown is an agent skill for solo builders who keep specs, research, and runbooks in Obsidian vaults. It teac…41k installs·34.9k stars

Journey fit

Primary fit

Canonical shelf is Build → docs because dedup maintains the structured knowledge base solo builders curate while shipping features and agent skills. Wiki identity resolution is documentation hygiene—collapsing alias pages so cross-links and retrieval stay trustworthy.

Common Questions / FAQ

Is Wiki Dedup safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Wiki Dedup

# Wiki Dedup — Identity Resolution and Page-Level Deduplication

You are finding and merging wiki pages that cover the same concept under different names. This is a write-heavy, potentially destructive skill — page merges cannot be automatically undone. Work carefully and confirm before acting in merge mode.

**Follow the Retrieval Primitives table in `llm-wiki/SKILL.md`.** The candidate-detection pass uses only frontmatter and titles (cheap). Only open full page bodies for confirmed candidate pairs.

## Before You Start

1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `OBSIDIAN_LINK_FORMAT`.
2. Read `index.md` to get the full page inventory with one-line descriptions and tags.
3. Read `log.md` briefly — if a dedup run just happened, note what was already merged.

## Modes

| Mode | Flag | Behavior |
|---|---|---|
| **Audit** | *(default)* | Report candidates only — no writes |
| **Merge** | `--merge` | Show each confirmed pair, ask for confirmation before merging |
| **Auto-merge** | `--auto` | Merge all high-confidence pairs (`score ≥ 0.90`) non-interactively |

If the user doesn't specify, run in **Audit** mode and present findings before asking whether to proceed.

## Step 1: Build the Page Registry

Glob all `.md` files in the vault (excluding `_archives/`, `_raw/`, `.obsidian/`, `index.md`, `log.md`, `hot.md`, `_insights.md`, and any file that contains `redirects_to:` in its frontmatter — those are already merged redirect stubs).

For each remaining page, extract from frontmatter:
- `node_id` — relative path from vault root, without `.md`
- `title` — frontmatter `title` field
- `aliases` — frontmatter `aliases` list (may be absent)
- `tags` — frontmatter `tags` list
- `category` — directory prefix

Build a lookup table: `node_id → {title, aliases, tags, category, summary}`.

## Step 2: Detect Candidate Pairs

For every pair of pages in the registry, compute a **similarity score** using these signals:

### 2a. Title similarity signals

| Signal | How to assess | Max contribution |
|---|---|---|
| **Token overlap** | Jaccard similarity of lowercased title word-tokens (split on spaces, hyphens, underscores, punctuation) | 0.65 |
| **Edit distance** | Normalized edit distance on lowercased titles: `1 - (edits / max(len_a, len_b))` | 0.40 |
| **Substring containment** | One title is a substring of the other (e.g. "RSC" ⊂ "React Server Components") | 0.50 |
| **Alias cross-match** | Page A's title appears in page B's `aliases`, or vice versa | 0.65 |

Composite title score = `min(max(token_overlap, edit_distance, substring), 0.65) + alias_cross_bonus`.

You don't need exact arithmetic — make a confident judgement about degree of similarity.

**Title extraction note:** Some pages use YAML block scalars (`title: >-` or `title: |`). When the `title:` value is `>-`, `>`, `|`, or `|-`, the actual title is on the next indented line — read it from there. Never compare the literal string `>-` as a title.

### 2b. Semantic signals (cheap pass)

| Signal | Points |
|---|---|
| Same `category` directory | +0.10 |
| Tag overlap ≥ 3 shared tags | +0.15 |
| Tag overlap ≥ 2 shared tags | +0.05 |
| Same first tag (dominant tag) | +0.05 |

### 2c. Threshold

Flag pairs with composite score ≥ **0

What is this skill?

Scans vault inventory via index.md for page-level identity collisions (e.g. RSC vs React Server Components)

Cheap candidate pass uses frontmatter and titles; full bodies open only for confirmed pairs per llm-wiki retrieval rules

Destructive merge mode requires explicit confirmation—merges are not auto-undoable

Resolves Obsidian config from CWD .env, ~/.obsidian-wiki/config, or setup prompt

Distinct from wiki-lint (structure) and cross-linker (links)—this skill deletes/consolidates pages

Adoption & trust: 666 installs on skills.sh; 1.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

SKILL.md

READMESKILL.md - Wiki Dedup

# Wiki Dedup — Identity Resolution and Page-Level Deduplication

You are finding and merging wiki pages that cover the same concept under different names. This is a write-heavy, potentially destructive skill — page merges cannot be automatically undone. Work carefully and confirm before acting in merge mode.

**Follow the Retrieval Primitives table in `llm-wiki/SKILL.md`.** The candidate-detection pass uses only frontmatter and titles (cheap). Only open full page bodies for confirmed candidate pairs.

## Before You Start

1. **Resolve config** — follow the Config Resolution Protocol in `llm-wiki/SKILL.md` (walk up CWD for `.env` → `~/.obsidian-wiki/config` → prompt setup). This gives `OBSIDIAN_VAULT_PATH` and `OBSIDIAN_LINK_FORMAT`.
2. Read `index.md` to get the full page inventory with one-line descriptions and tags.
3. Read `log.md` briefly — if a dedup run just happened, note what was already merged.

## Modes

| Mode | Flag | Behavior |
|---|---|---|
| **Audit** | *(default)* | Report candidates only — no writes |
| **Merge** | `--merge` | Show each confirmed pair, ask for confirmation before merging |
| **Auto-merge** | `--auto` | Merge all high-confidence pairs (`score ≥ 0.90`) non-interactively |

If the user doesn't specify, run in **Audit** mode and present findings before asking whether to proceed.

## Step 1: Build the Page Registry

Glob all `.md` files in the vault (excluding `_archives/`, `_raw/`, `.obsidian/`, `index.md`, `log.md`, `hot.md`, `_insights.md`, and any file that contains `redirects_to:` in its frontmatter — those are already merged redirect stubs).

For each remaining page, extract from frontmatter:
- `node_id` — relative path from vault root, without `.md`
- `title` — frontmatter `title` field
- `aliases` — frontmatter `aliases` list (may be absent)
- `tags` — frontmatter `tags` list
- `category` — directory prefix

Build a lookup table: `node_id → {title, aliases, tags, category, summary}`.

## Step 2: Detect Candidate Pairs

For every pair of pages in the registry, compute a **similarity score** using these signals:

### 2a. Title similarity signals

| Signal | How to assess | Max contribution |
|---|---|---|
| **Token overlap** | Jaccard similarity of lowercased title word-tokens (split on spaces, hyphens, underscores, punctuation) | 0.65 |
| **Edit distance** | Normalized edit distance on lowercased titles: `1 - (edits / max(len_a, len_b))` | 0.40 |
| **Substring containment** | One title is a substring of the other (e.g. "RSC" ⊂ "React Server Components") | 0.50 |
| **Alias cross-match** | Page A's title appears in page B's `aliases`, or vice versa | 0.65 |

Composite title score = `min(max(token_overlap, edit_distance, substring), 0.65) + alias_cross_bonus`.

You don't need exact arithmetic — make a confident judgement about degree of similarity.

**Title extraction note:** Some pages use YAML block scalars (`title: >-` or `title: |`). When the `title:` value is `>-`, `>`, `|`, or `|-`, the actual title is on the next indented line — read it from there. Never compare the literal string `>-` as a title.

### 2b. Semantic signals (cheap pass)

| Signal | Points |
|---|---|
| Same `category` directory | +0.10 |
| Tag overlap ≥ 3 shared tags | +0.15 |
| Tag overlap ≥ 2 shared tags | +0.05 |
| Same first tag (dominant tag) | +0.05 |

### 2c. Threshold

Flag pairs with composite score ≥ **0

Install

What is this skill?

Recommended Skills

Journey fit

Is Wiki Dedup safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Wiki Dedup safe to install?

SKILL.md