Knowledge Base Rag

Name: Knowledge Base Rag
Author: itallstartedwithaidea

itallstartedwithaidea/agent-skills

Stand up a private knowledge-base RAG pipeline—ingest, chunk, embed, index, retrieve, and cite answers for your agent product.

Overview

knowledge-base-rag is an agent skill for the Build phase that implements the full RAG pipeline from document ingestion to cited, retrieval-grounded answers.

Install

npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill knowledge-base-rag

What is this skill?

End-to-end RAG: document ingestion through grounded response generation with cited sources
Production chunking guidance: semantic, recursive structure-aware splits, and overlap windows
Addresses post-training cutoff and private data gaps via retrieval-injected context
Embedding generation and vector store indexing as first-class pipeline stages
Positions chunking strategy as higher leverage than raw model choice for answer quality
Covers semantic chunking, recursive structure-aware splitting, and overlap window strategies as production-tested approa

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

Your agent cannot reliably answer from private or fresh documents because the model alone lacks that knowledge and hallucinates without retrieval context.

Who is it for?

Indie builders adding doc Q&A or support copilots to a SaaS or agent product with a defined corpus to ingest.

Skip if: Simple chat wrappers with no private docs, one-off summaries without a vector index, or teams skipping ingestion design and expecting model upgrades alone to fix recall.

When should I use this skill?

Building or extending an agent that must answer questions from a private knowledge base using ingest, chunk, embed, index, retrieve, and cite workflow.

What do I get? / Deliverables

You deploy an indexed knowledge base with intentional chunking and retrieval so the agent returns grounded answers with source citations and fewer fabrications.

Ingestion and chunking pipeline configuration
Indexed vector store with retrieval integration into agent prompts
Grounded response path with cited source snippets

Recommended Skills

Microsoft Foundrymicrosoft/azure-skills

Microsoft Foundry skill guides agents through the full Azure AI Foundry lifecycle—containerizing agents, pushing to ACR,…377k installs·1.2k stars

Azure Aimicrosoft/azure-skills

azure-ai is a Prism-oriented quick reference for Microsoft Azure AI work, with the published body centered on the Azure …375k installs·1.2k stars

Azure Hosted Copilot Sdkmicrosoft/azure-skills

Azure Hosted Copilot SDK is Microsoft's entry skill for repos using @github/copilot-sdk—it detects CopilotClient usage, …346k installs·1.2k stars

Lark Eventlarksuite/cli

Lark real-time subscription skill via lark-cli event consume for building bots and streaming webhook-style agent workers…208k installs·13.7k stars

Running Claude Code Via Litellm Copilotxixu-me/skills

Running Claude Code via LiteLLM Copilot walks through pointing Claude Code at a local LiteLLM proxy that forwards Anthro…200k installs·61 stars

Setup Matt Pocock Skillsmattpocock/skills

One-time per-repo setup so Matt Pocock engineering skills share correct issue tracker, triage strings, and domain docume…180k installs·121k stars

Journey fit

Primary fit

BuildAgent skills & templates

RAG assembly is core Build work when you productize an agent that must answer from proprietary docs rather than model memory alone. Vector stores, chunking policies, and grounded generation are agent-tooling infrastructure, not launch SEO or operate monitoring.

Also useful

OperateIteration & experiments

How it compares

Skill-encoded RAG methodology for agents, not a turnkey hosted search appliance or a single MCP connector.

Common Questions / FAQ

Who is knowledge-base-rag for?

Solo developers and indie teams building LLM agents that must query organization-specific knowledge with citations and controlled hallucination risk.

When should I use knowledge-base-rag?

During Build agent-tooling while designing ingestion, chunking, embeddings, and vector retrieval; revisit during Operate iterate when tuning recall and index freshness.

Is knowledge-base-rag safe to install?

RAG skills often imply external APIs and document processing—review the Security Audits panel on this Prism page and scope network and secret access before production ingest.

SKILL.md

READMESKILL.md - Knowledge Base Rag

# Knowledge Base RAG

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination.

RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge.

The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality.

## Use When

- Building question-answering systems over private documents
- Creating a searchable knowledge base from documentation, wikis, or PDFs
- Reducing hallucination by grounding LLM responses in retrieved facts
- Implementing semantic search across large document collections
- Building customer support bots with product-specific knowledge
- The user asks about RAG, vector search, or document embedding

## How It Works

```mermaid
graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]
```

The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step.

## Implementation

```python
from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = cur

What is this skill?

End-to-end RAG: document ingestion through grounded response generation with cited sources

Production chunking guidance: semantic, recursive structure-aware splits, and overlap windows

Addresses post-training cutoff and private data gaps via retrieval-injected context

Embedding generation and vector store indexing as first-class pipeline stages

Positions chunking strategy as higher leverage than raw model choice for answer quality

Covers semantic chunking, recursive structure-aware splitting, and overlap window strategies as production-tested approa

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

You deploy an indexed knowledge base with intentional chunking and retrieval so the agent returns grounded answers with source citations and fewer fabrications.

Ingestion and chunking pipeline configuration

Indexed vector store with retrieval integration into agent prompts

Grounded response path with cited source snippets

Journey fit

Primary fit

BuildAgent skills & templates

Also useful

OperateIteration & experiments

SKILL.md

READMESKILL.md - Knowledge Base Rag

# Knowledge Base RAG

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination.

RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge.

The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality.

## Use When

- Building question-answering systems over private documents
- Creating a searchable knowledge base from documentation, wikis, or PDFs
- Reducing hallucination by grounding LLM responses in retrieved facts
- Implementing semantic search across large document collections
- Building customer support bots with product-specific knowledge
- The user asks about RAG, vector search, or document embedding

## How It Works

```mermaid
graph TD
    A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline]
    B --> C[Extract Text + Metadata]
    C --> D[Intelligent Chunking]
    D --> E[Generate Embeddings]
    E --> F[Index in Vector Store]
    G[User Query] --> H[Embed Query]
    H --> I[Semantic Search: Top-K]
    I --> J[Re-rank Results]
    J --> K[Construct Prompt with Context]
    K --> L[LLM Generation]
    L --> M[Response with Citations]
```

The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step.

## Implementation

```python
from dataclasses import dataclass
import hashlib

@dataclass
class Chunk:
    text: str
    metadata: dict
    embedding: list[float] | None = None

    @property
    def id(self) -> str:
        return hashlib.sha256(self.text.encode()).hexdigest()[:16]

class RecursiveChunker:
    def __init__(self, max_tokens: int = 512, overlap: int = 64):
        self.max_tokens = max_tokens
        self.overlap = overlap
        self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "]

    def chunk(self, text: str, metadata: dict) -> list[Chunk]:
        chunks = self._split(text, self.separators)
        return [
            Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i})
            for i, c in enumerate(chunks) if c.strip()
        ]

    def _split(self, text: str, separators: list[str]) -> list[str]:
        if not separators or self._token_count(text) <= self.max_tokens:
            return [text]

        sep = separators[0]
        parts = text.split(sep)
        chunks, current = [], ""

        for part in parts:
            candidate = current + sep + part if current else part
            if self._token_count(candidate) > self.max_tokens and current:
                chunks.append(current)
                overlap_text = cur

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is knowledge-base-rag for?

When should I use knowledge-base-rag?

Is knowledge-base-rag safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is knowledge-base-rag for?

When should I use knowledge-base-rag?

Is knowledge-base-rag safe to install?

SKILL.md