
Indexion Segment
Chunk documents with indexion segment CLI strategies before embedding for a solo RAG or search pipeline.
Overview
Indexion Segment is an agent skill for the Build phase that splits text into contextual chunks via the indexion CLI using window, TF-IDF, punctuation, or hybrid strategies for RAG and embedding pipelines.
Install
npx skills add https://github.com/trkbt10/indexion-skills --skill indexion-segmentWhat is this skill?
- CLI: indexion segment with input file and output directory
- Strategies: default window divergence, tfidf, punctuation, and hybrid NCD+TF-IDF
- Tunable min, max, and target segment sizes (example flags --min-size=200 --max-size=3000 --target-size=800)
- Adaptive threshold mode and custom --window-size and --prefix
- Intent-driven segmentation for RAG, similarity analysis, and section extraction
- Multiple segmentation strategies: window, tfidf, punctuation, hybrid with configurable min/max/target sizes
Adoption & trust: 511 installs on skills.sh; 1 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have raw documents but naive fixed-size chunks destroy context, breaking retrieval quality in your indie RAG stack.
Who is it for?
Builders preprocessing local text files for custom RAG with explicit control over chunking strategy and sizes.
Skip if: Teams that only use vendor-managed chunking inside a hosted vector DB with no local CLI pipeline.
When should I use this skill?
User needs to chunk text for RAG or embedding pipelines, split documents into sections, or segment text for sub-document similarity analysis.
What do I get? / Deliverables
You run strategy-tuned indexion segment commands that write chunked files to an output dir, ready for embedding and index load.
- Segmented chunk files in output directory
- Documented CLI invocation with chosen strategy
Recommended Skills
Journey fit
How it compares
CLI segmentation skill—not a full embedding service or MCP retrieval server.
Common Questions / FAQ
Who is indexion-segment for?
Solo developers piping documents into embeddings who want agents to invoke indexion segment with the right strategy flags.
When should I use indexion-segment?
During Build agent-tooling when chunking text for RAG, splitting docs for similarity analysis, or preparing sections before vector indexing.
Is indexion-segment safe to install?
Review the Security Audits panel on this page; the CLI reads and writes local files—limit paths and avoid secrets in source documents.
SKILL.md
READMESKILL.md - Indexion Segment
# indexion segment Split text into contextual segments using divergence-based, TF-IDF, or punctuation strategies. ## When to Use - User needs to chunk text for RAG or embedding pipelines - User wants to split a document into meaningful sections - User asks to segment text for processing - Preparing text for similarity analysis at sub-document level ## Usage ```bash # Default window divergence strategy indexion segment <input-file> <output-dir> # TF-IDF based segmentation indexion segment --strategy=tfidf <input-file> <output-dir> # Punctuation-based segmentation indexion segment --strategy=punctuation <input-file> <output-dir> # Custom segment sizes indexion segment --min-size=200 --max-size=3000 --target-size=800 document.txt output/ # Custom divergence threshold indexion segment --threshold=0.5 document.txt output/ # Adaptive threshold mode (default) indexion segment --adaptive document.txt output/ # Hybrid NCD+TF-IDF mode indexion segment --hybrid --ncd-weight=0.6 --tfidf-weight=0.4 document.txt output/ # Custom window size indexion segment --window-size=5 document.txt output/ # Custom output prefix indexion segment --prefix=chunk document.txt output/ ``` ## Options | Option | Default | Description | |--------|---------|-------------| | `--strategy=NAME` | window | Strategy: window, tfidf, punctuation | | `--min-size=INT` | 100 | Minimum segment characters | | `--max-size=INT` | 2000 | Maximum segment characters | | `--target-size=INT` | 500 | Target segment characters | | `--threshold=FLOAT` | 0.42 | Divergence threshold | | `--window-size=INT` | 3 | Window size | | `--adaptive` | true | Adaptive threshold mode | | `--hybrid` | false | NCD+TF-IDF hybrid mode | | `--ncd-weight=FLOAT` | 0.5 | NCD weight in hybrid mode | | `--tfidf-weight=FLOAT` | 0.5 | TF-IDF weight in hybrid mode | | `--prefix=NAME` | segment | Output file prefix | ## Strategies | Strategy | Description | |----------|-------------| | `window` (default) | Sliding window divergence detection | | `tfidf` | TF-IDF based topic change detection | | `punctuation` | Punctuation/sentence boundary based | ## Workflow 1. Run `indexion segment <input-file> <output-dir>` to split text with defaults 2. Adjust `--threshold` and `--target-size` to tune segmentation granularity 3. Use `--hybrid` mode for better accuracy on mixed-content documents