
Grepai Chunking
Tune GrepAI `.grepai/config.yaml` chunk size and overlap so semantic code search matches how your repo is structured.
Install
npx skills add https://github.com/yoanbernabeu/grepai-skills --skill grepai-chunkingWhat is this skill?
- Explains token-based chunking with a visual split from large files into ~512-token segments
- Documents `chunking.size` and `chunking.overlap` in `.grepai/config.yaml`
- Covers tradeoffs: oversized chunks reduce precision, undersized chunks lose context
- Guidance for verbose vs concise code styles and troubleshooting weak search hits
- Clarifies that each chunk receives its own embedding for vector search
Adoption & trust: 497 installs on skills.sh; 17 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Microsoft Foundrymicrosoft/azure-skills
Azure Aimicrosoft/azure-skills
Azure Hosted Copilot Sdkmicrosoft/azure-skills
Lark Eventlarksuite/cli
Running Claude Code Via Litellm Copilotxixu-me/skills
Setup Matt Pocock Skillsmattpocock/skills
Journey fit
Primary fit
Indexing and chunking are configured while wiring agent-side code search into a repo during the Build phase. Agent-tooling is the right shelf because GrepAI chunking directly affects embeddings and retrieval quality for coding agents.
Common Questions / FAQ
Is Grepai Chunking safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Grepai Chunking
# GrepAI Chunking Configuration This skill covers how GrepAI splits code files into chunks for embedding, and how to optimize chunking for your codebase. ## When to Use This Skill - Optimizing search accuracy - Adjusting for code style (verbose vs. concise) - Troubleshooting search results - Understanding how indexing works ## What is Chunking? Chunking is the process of splitting source files into smaller segments for embedding: ``` ┌─────────────────────────────────────┐ │ Large Source File │ │ (1000+ tokens) │ └─────────────────────────────────────┘ ↓ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ ~512 │ │ ~512 │ │ ~512 │ │ tokens │ │ tokens │ │ tokens │ └─────────┘ └─────────┘ └─────────┘ ↓ Each chunk gets its own embedding ``` ## Why Chunking Matters Embedding models have optimal input sizes: - **Too large chunks:** Less precise search results - **Too small chunks:** Lost context, fragmented results - **Just right:** Good balance of precision and context ## Configuration ### Basic Settings ```yaml # .grepai/config.yaml chunking: size: 512 # Tokens per chunk overlap: 50 # Overlap between chunks ``` ### Understanding Parameters #### Chunk Size The target number of tokens per chunk. | Size | Effect | |------|--------| | 256 | More precise, less context | | 512 | Balanced (default) | | 1024 | More context, less precise | #### Overlap Tokens shared between adjacent chunks. Preserves context at boundaries. | Overlap | Effect | |---------|--------| | 0 | No overlap, may lose context at boundaries | | 50 | Standard overlap (default) | | 100 | More context, larger index | ## Visualization With size=512 and overlap=50: ``` File: auth.go (1000 tokens) Chunk 1: tokens 1-512 ┌────────────────────────────────────┐ │ func Login(user, pass)... │ └────────────────────────────────────┘ ↘ 50 token overlap ↙ Chunk 2: tokens 463-974 ┌────────────────────────────────────┐ │ ...validate credentials... │ └────────────────────────────────────┘ ↘ 50 token overlap ↙ Chunk 3: tokens 925-1000 ┌──────────────┐ │ ...return │ └──────────────┘ ``` ## Recommended Settings by Language ### Verbose Languages (Java, C#) ```yaml chunking: size: 768 # Larger to capture full methods overlap: 75 ``` ### Concise Languages (Go, Python) ```yaml chunking: size: 512 # Standard size overlap: 50 ``` ### Very Concise (Rust, Zig) ```yaml chunking: size: 384 # Smaller for precise results overlap: 40 ``` ## Recommended Settings by Codebase ### Small Functions (Microservices) ```yaml chunking: size: 384 # Capture individual functions overlap: 40 ``` ### Large Classes (Monolith) ```yaml chunking: size: 768 # Capture more context overlap: 100 ``` ### Mixed Codebase ```yaml chunking: size: 512 # Balanced default overlap: 50 ``` ## How Tokens are Counted GrepAI uses approximate token counting: - ~4 characters = 1 token (for English text) - Code varies based on identifiers and syntax Example: ```go func calculateTotal(items []Item) float64 { total := 0.0 for _, item := range items { total += item.Price * float64(item.Quantity) } return total } ``` ≈ 45 tokens ## Impact on Index Size Larger overlap = more chunks = larger index: | Size | Overlap | Chunks per 10K tokens | Index Impact | |------|---------|----------------------|--------------| | 512 | 0 | ~20 | Smallest | | 512 | 50 | ~22 | Standard