Indexion Segment

Name: Indexion Segment
Author: trkbt10

trkbt10/indexion-skills

Chunk documents with indexion segment CLI strategies before embedding for a solo RAG or search pipeline.

Overview

Indexion Segment is an agent skill for the Build phase that splits text into contextual chunks via the indexion CLI using window, TF-IDF, punctuation, or hybrid strategies for RAG and embedding pipelines.

Install

npx skills add https://github.com/trkbt10/indexion-skills --skill indexion-segment

What is this skill?

CLI: indexion segment with input file and output directory
Strategies: default window divergence, tfidf, punctuation, and hybrid NCD+TF-IDF
Tunable min, max, and target segment sizes (example flags --min-size=200 --max-size=3000 --target-size=800)
Adaptive threshold mode and custom --window-size and --prefix
Intent-driven segmentation for RAG, similarity analysis, and section extraction
Multiple segmentation strategies: window, tfidf, punctuation, hybrid with configurable min/max/target sizes

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 511 installs on skills.sh; 1 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have raw documents but naive fixed-size chunks destroy context, breaking retrieval quality in your indie RAG stack.

Who is it for?

Builders preprocessing local text files for custom RAG with explicit control over chunking strategy and sizes.

Skip if: Teams that only use vendor-managed chunking inside a hosted vector DB with no local CLI pipeline.

When should I use this skill?

User needs to chunk text for RAG or embedding pipelines, split documents into sections, or segment text for sub-document similarity analysis.

What do I get? / Deliverables

You run strategy-tuned indexion segment commands that write chunked files to an output dir, ready for embedding and index load.

Segmented chunk files in output directory
Documented CLI invocation with chosen strategy

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildAgent skills & templates

Segmentation is build-time data prep for agents and RAG—not launch SEO or grow analytics. Agent-tooling covers embedding pipelines and sub-document chunking that models consume.

How it compares

CLI segmentation skill—not a full embedding service or MCP retrieval server.

Common Questions / FAQ

Who is indexion-segment for?

Solo developers piping documents into embeddings who want agents to invoke indexion segment with the right strategy flags.

When should I use indexion-segment?

During Build agent-tooling when chunking text for RAG, splitting docs for similarity analysis, or preparing sections before vector indexing.

Is indexion-segment safe to install?

Review the Security Audits panel on this page; the CLI reads and writes local files—limit paths and avoid secrets in source documents.

SKILL.md

READMESKILL.md - Indexion Segment

# indexion segment

Split text into contextual segments using divergence-based, TF-IDF, or punctuation strategies.

## When to Use

- User needs to chunk text for RAG or embedding pipelines
- User wants to split a document into meaningful sections
- User asks to segment text for processing
- Preparing text for similarity analysis at sub-document level

## Usage

```bash
# Default window divergence strategy
indexion segment <input-file> <output-dir>

# TF-IDF based segmentation
indexion segment --strategy=tfidf <input-file> <output-dir>

# Punctuation-based segmentation
indexion segment --strategy=punctuation <input-file> <output-dir>

# Custom segment sizes
indexion segment --min-size=200 --max-size=3000 --target-size=800 document.txt output/

# Custom divergence threshold
indexion segment --threshold=0.5 document.txt output/

# Adaptive threshold mode (default)
indexion segment --adaptive document.txt output/

# Hybrid NCD+TF-IDF mode
indexion segment --hybrid --ncd-weight=0.6 --tfidf-weight=0.4 document.txt output/

# Custom window size
indexion segment --window-size=5 document.txt output/

# Custom output prefix
indexion segment --prefix=chunk document.txt output/
```

## Options

| Option | Default | Description |
|--------|---------|-------------|
| `--strategy=NAME` | window | Strategy: window, tfidf, punctuation |
| `--min-size=INT` | 100 | Minimum segment characters |
| `--max-size=INT` | 2000 | Maximum segment characters |
| `--target-size=INT` | 500 | Target segment characters |
| `--threshold=FLOAT` | 0.42 | Divergence threshold |
| `--window-size=INT` | 3 | Window size |
| `--adaptive` | true | Adaptive threshold mode |
| `--hybrid` | false | NCD+TF-IDF hybrid mode |
| `--ncd-weight=FLOAT` | 0.5 | NCD weight in hybrid mode |
| `--tfidf-weight=FLOAT` | 0.5 | TF-IDF weight in hybrid mode |
| `--prefix=NAME` | segment | Output file prefix |

## Strategies

| Strategy | Description |
|----------|-------------|
| `window` (default) | Sliding window divergence detection |
| `tfidf` | TF-IDF based topic change detection |
| `punctuation` | Punctuation/sentence boundary based |

## Workflow

1. Run `indexion segment <input-file> <output-dir>` to split text with defaults
2. Adjust `--threshold` and `--target-size` to tune segmentation granularity
3. Use `--hybrid` mode for better accuracy on mixed-content documents

What is this skill?

CLI: indexion segment with input file and output directory

Strategies: default window divergence, tfidf, punctuation, and hybrid NCD+TF-IDF

Tunable min, max, and target segment sizes (example flags --min-size=200 --max-size=3000 --target-size=800)

Adaptive threshold mode and custom --window-size and --prefix

Intent-driven segmentation for RAG, similarity analysis, and section extraction

Multiple segmentation strategies: window, tfidf, punctuation, hybrid with configurable min/max/target sizes

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 511 installs on skills.sh; 1 GitHub stars; 3/3 security scanners passed (skills.sh audits).

SKILL.md

READMESKILL.md - Indexion Segment

# indexion segment

Split text into contextual segments using divergence-based, TF-IDF, or punctuation strategies.

## When to Use

- User needs to chunk text for RAG or embedding pipelines
- User wants to split a document into meaningful sections
- User asks to segment text for processing
- Preparing text for similarity analysis at sub-document level

## Usage

```bash
# Default window divergence strategy
indexion segment <input-file> <output-dir>

# TF-IDF based segmentation
indexion segment --strategy=tfidf <input-file> <output-dir>

# Punctuation-based segmentation
indexion segment --strategy=punctuation <input-file> <output-dir>

# Custom segment sizes
indexion segment --min-size=200 --max-size=3000 --target-size=800 document.txt output/

# Custom divergence threshold
indexion segment --threshold=0.5 document.txt output/

# Adaptive threshold mode (default)
indexion segment --adaptive document.txt output/

# Hybrid NCD+TF-IDF mode
indexion segment --hybrid --ncd-weight=0.6 --tfidf-weight=0.4 document.txt output/

# Custom window size
indexion segment --window-size=5 document.txt output/

# Custom output prefix
indexion segment --prefix=chunk document.txt output/
```

## Options

| Option | Default | Description |
|--------|---------|-------------|
| `--strategy=NAME` | window | Strategy: window, tfidf, punctuation |
| `--min-size=INT` | 100 | Minimum segment characters |
| `--max-size=INT` | 2000 | Maximum segment characters |
| `--target-size=INT` | 500 | Target segment characters |
| `--threshold=FLOAT` | 0.42 | Divergence threshold |
| `--window-size=INT` | 3 | Window size |
| `--adaptive` | true | Adaptive threshold mode |
| `--hybrid` | false | NCD+TF-IDF hybrid mode |
| `--ncd-weight=FLOAT` | 0.5 | NCD weight in hybrid mode |
| `--tfidf-weight=FLOAT` | 0.5 | TF-IDF weight in hybrid mode |
| `--prefix=NAME` | segment | Output file prefix |

## Strategies

| Strategy | Description |
|----------|-------------|
| `window` (default) | Sliding window divergence detection |
| `tfidf` | TF-IDF based topic change detection |
| `punctuation` | Punctuation/sentence boundary based |

## Workflow

1. Run `indexion segment <input-file> <output-dir>` to split text with defaults
2. Adjust `--threshold` and `--target-size` to tune segmentation granularity
3. Use `--hybrid` mode for better accuracy on mixed-content documents

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is indexion-segment for?

When should I use indexion-segment?

Is indexion-segment safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is indexion-segment for?

When should I use indexion-segment?

Is indexion-segment safe to install?

SKILL.md