
Knowledge Base Rag
Stand up a private knowledge-base RAG pipeline—ingest, chunk, embed, index, retrieve, and cite answers for your agent product.
Overview
knowledge-base-rag is an agent skill for the Build phase that implements the full RAG pipeline from document ingestion to cited, retrieval-grounded answers.
Install
npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill knowledge-base-ragWhat is this skill?
- End-to-end RAG: document ingestion through grounded response generation with cited sources
- Production chunking guidance: semantic, recursive structure-aware splits, and overlap windows
- Addresses post-training cutoff and private data gaps via retrieval-injected context
- Embedding generation and vector store indexing as first-class pipeline stages
- Positions chunking strategy as higher leverage than raw model choice for answer quality
- Covers semantic chunking, recursive structure-aware splitting, and overlap window strategies as production-tested approa
Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
Your agent cannot reliably answer from private or fresh documents because the model alone lacks that knowledge and hallucinates without retrieval context.
Who is it for?
Indie builders adding doc Q&A or support copilots to a SaaS or agent product with a defined corpus to ingest.
Skip if: Simple chat wrappers with no private docs, one-off summaries without a vector index, or teams skipping ingestion design and expecting model upgrades alone to fix recall.
When should I use this skill?
Building or extending an agent that must answer questions from a private knowledge base using ingest, chunk, embed, index, retrieve, and cite workflow.
What do I get? / Deliverables
You deploy an indexed knowledge base with intentional chunking and retrieval so the agent returns grounded answers with source citations and fewer fabrications.
- Ingestion and chunking pipeline configuration
- Indexed vector store with retrieval integration into agent prompts
- Grounded response path with cited source snippets
Recommended Skills
Journey fit
RAG assembly is core Build work when you productize an agent that must answer from proprietary docs rather than model memory alone. Vector stores, chunking policies, and grounded generation are agent-tooling infrastructure, not launch SEO or operate monitoring.
How it compares
Skill-encoded RAG methodology for agents, not a turnkey hosted search appliance or a single MCP connector.
Common Questions / FAQ
Who is knowledge-base-rag for?
Solo developers and indie teams building LLM agents that must query organization-specific knowledge with citations and controlled hallucination risk.
When should I use knowledge-base-rag?
During Build agent-tooling while designing ingestion, chunking, embeddings, and vector retrieval; revisit during Operate iterate when tuning recall and index freshness.
Is knowledge-base-rag safe to install?
RAG skills often imply external APIs and document processing—review the Security Audits panel on this Prism page and scope network and secret access before production ingest.
SKILL.md
READMESKILL.md - Knowledge Base Rag
# Knowledge Base RAG Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai) ## Description Knowledge Base RAG implements the complete Retrieval-Augmented Generation pipeline: document ingestion, intelligent chunking, embedding generation, vector store indexing, semantic retrieval, and grounded response generation. The agent builds RAG systems that answer questions from private knowledge bases with cited sources and reduced hallucination. RAG solves the fundamental limitation of large language models: they cannot access information created after their training cutoff or proprietary information they were never trained on. By retrieving relevant documents from a vector store and injecting them into the prompt context, RAG grounds the model's responses in factual, up-to-date, organization-specific knowledge. The quality of a RAG system depends on chunking strategy more than model choice. This skill encodes production-tested chunking approaches: semantic chunking that preserves paragraph coherence, recursive splitting that respects document structure (headings, code blocks, tables), and overlap windows that maintain context across chunk boundaries. Each strategy is matched to the document type for optimal retrieval quality. ## Use When - Building question-answering systems over private documents - Creating a searchable knowledge base from documentation, wikis, or PDFs - Reducing hallucination by grounding LLM responses in retrieved facts - Implementing semantic search across large document collections - Building customer support bots with product-specific knowledge - The user asks about RAG, vector search, or document embedding ## How It Works ```mermaid graph TD A[Documents: PDF, MD, HTML] --> B[Ingestion Pipeline] B --> C[Extract Text + Metadata] C --> D[Intelligent Chunking] D --> E[Generate Embeddings] E --> F[Index in Vector Store] G[User Query] --> H[Embed Query] H --> I[Semantic Search: Top-K] I --> J[Re-rank Results] J --> K[Construct Prompt with Context] K --> L[LLM Generation] L --> M[Response with Citations] ``` The pipeline has two phases: offline ingestion (documents to vectors) and online retrieval (query to answer). The re-ranking step applies a cross-encoder to refine the initial vector search results, improving precision before the generation step. ## Implementation ```python from dataclasses import dataclass import hashlib @dataclass class Chunk: text: str metadata: dict embedding: list[float] | None = None @property def id(self) -> str: return hashlib.sha256(self.text.encode()).hexdigest()[:16] class RecursiveChunker: def __init__(self, max_tokens: int = 512, overlap: int = 64): self.max_tokens = max_tokens self.overlap = overlap self.separators = ["\n## ", "\n### ", "\n\n", "\n", ". ", " "] def chunk(self, text: str, metadata: dict) -> list[Chunk]: chunks = self._split(text, self.separators) return [ Chunk(text=c.strip(), metadata={**metadata, "chunk_index": i}) for i, c in enumerate(chunks) if c.strip() ] def _split(self, text: str, separators: list[str]) -> list[str]: if not separators or self._token_count(text) <= self.max_tokens: return [text] sep = separators[0] parts = text.split(sep) chunks, current = [], "" for part in parts: candidate = current + sep + part if current else part if self._token_count(candidate) > self.max_tokens and current: chunks.append(current) overlap_text = cur