Grepai Chunking

Name: Grepai Chunking
Author: yoanbernabeu

yoanbernabeu/grepai-skills

Tune GrepAI `.grepai/config.yaml` chunk size and overlap so semantic code search matches how your repo is structured.

Install

npx skills add https://github.com/yoanbernabeu/grepai-skills --skill grepai-chunking

What is this skill?

Explains token-based chunking with a visual split from large files into ~512-token segments
Documents `chunking.size` and `chunking.overlap` in `.grepai/config.yaml`
Covers tradeoffs: oversized chunks reduce precision, undersized chunks lose context
Guidance for verbose vs concise code styles and troubleshooting weak search hits
Clarifies that each chunk receives its own embedding for vector search

Adoption & trust: 497 installs on skills.sh; 17 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildAgent skills & templates

Indexing and chunking are configured while wiring agent-side code search into a repo during the Build phase. Agent-tooling is the right shelf because GrepAI chunking directly affects embeddings and retrieval quality for coding agents.

Common Questions / FAQ

Is Grepai Chunking safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Grepai Chunking

# GrepAI Chunking Configuration

This skill covers how GrepAI splits code files into chunks for embedding, and how to optimize chunking for your codebase.

## When to Use This Skill

- Optimizing search accuracy
- Adjusting for code style (verbose vs. concise)
- Troubleshooting search results
- Understanding how indexing works

## What is Chunking?

Chunking is the process of splitting source files into smaller segments for embedding:

```
┌─────────────────────────────────────┐
│         Large Source File           │
│         (1000+ tokens)              │
└─────────────────────────────────────┘
                  ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │
│ ~512    │ │ ~512    │ │ ~512    │
│ tokens  │ │ tokens  │ │ tokens  │
└─────────┘ └─────────┘ └─────────┘
                  ↓
          Each chunk gets
          its own embedding
```

## Why Chunking Matters

Embedding models have optimal input sizes:
- **Too large chunks:** Less precise search results
- **Too small chunks:** Lost context, fragmented results
- **Just right:** Good balance of precision and context

## Configuration

### Basic Settings

```yaml
# .grepai/config.yaml
chunking:
  size: 512      # Tokens per chunk
  overlap: 50    # Overlap between chunks
```

### Understanding Parameters

#### Chunk Size

The target number of tokens per chunk.

| Size | Effect |
|------|--------|
| 256 | More precise, less context |
| 512 | Balanced (default) |
| 1024 | More context, less precise |

#### Overlap

Tokens shared between adjacent chunks. Preserves context at boundaries.

| Overlap | Effect |
|---------|--------|
| 0 | No overlap, may lose context at boundaries |
| 50 | Standard overlap (default) |
| 100 | More context, larger index |

## Visualization

With size=512 and overlap=50:

```
File: auth.go (1000 tokens)

Chunk 1: tokens 1-512
         ┌────────────────────────────────────┐
         │ func Login(user, pass)...          │
         └────────────────────────────────────┘
                                    ↘
                              50 token overlap
                                    ↙
Chunk 2: tokens 463-974
         ┌────────────────────────────────────┐
         │ ...validate credentials...         │
         └────────────────────────────────────┘
                                    ↘
                              50 token overlap
                                    ↙
Chunk 3: tokens 925-1000
         ┌──────────────┐
         │ ...return    │
         └──────────────┘
```

## Recommended Settings by Language

### Verbose Languages (Java, C#)

```yaml
chunking:
  size: 768    # Larger to capture full methods
  overlap: 75
```

### Concise Languages (Go, Python)

```yaml
chunking:
  size: 512    # Standard size
  overlap: 50
```

### Very Concise (Rust, Zig)

```yaml
chunking:
  size: 384    # Smaller for precise results
  overlap: 40
```

## Recommended Settings by Codebase

### Small Functions (Microservices)

```yaml
chunking:
  size: 384    # Capture individual functions
  overlap: 40
```

### Large Classes (Monolith)

```yaml
chunking:
  size: 768    # Capture more context
  overlap: 100
```

### Mixed Codebase

```yaml
chunking:
  size: 512    # Balanced default
  overlap: 50
```

## How Tokens are Counted

GrepAI uses approximate token counting:
- ~4 characters = 1 token (for English text)
- Code varies based on identifiers and syntax

Example:
```go
func calculateTotal(items []Item) float64 {
    total := 0.0
    for _, item := range items {
        total += item.Price * float64(item.Quantity)
    }
    return total
}
```
≈ 45 tokens

## Impact on Index Size

Larger overlap = more chunks = larger index:

| Size | Overlap | Chunks per 10K tokens | Index Impact |
|------|---------|----------------------|--------------|
| 512 | 0 | ~20 | Smallest |
| 512 | 50 | ~22 | Standard

What is this skill?

Explains token-based chunking with a visual split from large files into ~512-token segments

Documents `chunking.size` and `chunking.overlap` in `.grepai/config.yaml`

Covers tradeoffs: oversized chunks reduce precision, undersized chunks lose context

Guidance for verbose vs concise code styles and troubleshooting weak search hits

Clarifies that each chunk receives its own embedding for vector search

Adoption & trust: 497 installs on skills.sh; 17 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildAgent skills & templates

SKILL.md

READMESKILL.md - Grepai Chunking

# GrepAI Chunking Configuration

This skill covers how GrepAI splits code files into chunks for embedding, and how to optimize chunking for your codebase.

## When to Use This Skill

- Optimizing search accuracy
- Adjusting for code style (verbose vs. concise)
- Troubleshooting search results
- Understanding how indexing works

## What is Chunking?

Chunking is the process of splitting source files into smaller segments for embedding:

```
┌─────────────────────────────────────┐
│         Large Source File           │
│         (1000+ tokens)              │
└─────────────────────────────────────┘
                  ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │
│ ~512    │ │ ~512    │ │ ~512    │
│ tokens  │ │ tokens  │ │ tokens  │
└─────────┘ └─────────┘ └─────────┘
                  ↓
          Each chunk gets
          its own embedding
```

## Why Chunking Matters

Embedding models have optimal input sizes:
- **Too large chunks:** Less precise search results
- **Too small chunks:** Lost context, fragmented results
- **Just right:** Good balance of precision and context

## Configuration

### Basic Settings

```yaml
# .grepai/config.yaml
chunking:
  size: 512      # Tokens per chunk
  overlap: 50    # Overlap between chunks
```

### Understanding Parameters

#### Chunk Size

The target number of tokens per chunk.

| Size | Effect |
|------|--------|
| 256 | More precise, less context |
| 512 | Balanced (default) |
| 1024 | More context, less precise |

#### Overlap

Tokens shared between adjacent chunks. Preserves context at boundaries.

| Overlap | Effect |
|---------|--------|
| 0 | No overlap, may lose context at boundaries |
| 50 | Standard overlap (default) |
| 100 | More context, larger index |

## Visualization

With size=512 and overlap=50:

```
File: auth.go (1000 tokens)

Chunk 1: tokens 1-512
         ┌────────────────────────────────────┐
         │ func Login(user, pass)...          │
         └────────────────────────────────────┘
                                    ↘
                              50 token overlap
                                    ↙
Chunk 2: tokens 463-974
         ┌────────────────────────────────────┐
         │ ...validate credentials...         │
         └────────────────────────────────────┘
                                    ↘
                              50 token overlap
                                    ↙
Chunk 3: tokens 925-1000
         ┌──────────────┐
         │ ...return    │
         └──────────────┘
```

## Recommended Settings by Language

### Verbose Languages (Java, C#)

```yaml
chunking:
  size: 768    # Larger to capture full methods
  overlap: 75
```

### Concise Languages (Go, Python)

```yaml
chunking:
  size: 512    # Standard size
  overlap: 50
```

### Very Concise (Rust, Zig)

```yaml
chunking:
  size: 384    # Smaller for precise results
  overlap: 40
```

## Recommended Settings by Codebase

### Small Functions (Microservices)

```yaml
chunking:
  size: 384    # Capture individual functions
  overlap: 40
```

### Large Classes (Monolith)

```yaml
chunking:
  size: 768    # Capture more context
  overlap: 100
```

### Mixed Codebase

```yaml
chunking:
  size: 512    # Balanced default
  overlap: 50
```

## How Tokens are Counted

GrepAI uses approximate token counting:
- ~4 characters = 1 token (for English text)
- Code varies based on identifiers and syntax

Example:
```go
func calculateTotal(items []Item) float64 {
    total := 0.0
    for _, item := range items {
        total += item.Price * float64(item.Quantity)
    }
    return total
}
```
≈ 45 tokens

## Impact on Index Size

Larger overlap = more chunks = larger index:

| Size | Overlap | Chunks per 10K tokens | Index Impact |
|------|---------|----------------------|--------------|
| 512 | 0 | ~20 | Smallest |
| 512 | 50 | ~22 | Standard

Install

What is this skill?

Recommended Skills

Journey fit

Is Grepai Chunking safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Grepai Chunking safe to install?

SKILL.md