
Rag Engineer
Design and tune retrieval-augmented generation pipelines—chunking, embeddings, vector stores, and hybrid search—for agent products that must cite real docs.
Overview
Rag-engineer is an agent skill most often used in Build (also Ship testing, Operate monitoring) that architects retrieval-augmented generation with embeddings, vector stores, chunking, and hybrid search optimization.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill rag-engineerWhat is this skill?
- Principle: retrieval quality before generation quality—fix retrieval first
- Covers embedding selection, vector DB scaling, chunking by content type, and hybrid search
- Re-ranking, filtering, and context window management for production RAG
- Separate evaluation metrics for retrieval vs generation to avoid blind tuning
- Warns embeddings have blind spots—not a substitute for structured filters
Adoption & trust: 715 installs on skills.sh; 40.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your LLM app sounds confident but pulls wrong chunks, so answers ignore your actual docs and users stop trusting the agent.
Who is it for?
Indie builders adding document Q&A, support agents, or internal copilots who need a structured RAG pass before picking a vector database vendor.
Skip if: Simple one-shot prompts with no corpus, or teams that only need a hosted chat widget with vendor-managed retrieval and no custom index.
When should I use this skill?
Building or debugging an LLM feature that retrieves from documents and retrieval mistakes are driving bad answers.
What do I get? / Deliverables
You leave with a retrieval-first design—chunking, embeddings, hybrid search, and eval hooks—so generation quality tracks measurable retrieval metrics.
- RAG architecture recommendations
- Chunking and hybrid search plan
- Retrieval evaluation approach
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
RAG is core build-time architecture for LLM features and agent knowledge layers before production hardening. Agent-tooling covers retrieval stacks, context assembly, and tooling that agents use to ground answers in your data.
Where it fits
Pick chunk size and metadata filters for a Stripe-docs-style API reference ingested into pgvector.
Wire hybrid BM25 plus dense retrieval for a Notion export with uneven heading structure.
Run retrieval-only benchmarks with held-out questions before enabling the agent in production.
Trace recall@k and latency regressions after re-embedding a corpus with a new model version.
How it compares
Skill-level RAG systems design, not a drop-in MCP server or managed vector SaaS console.
Common Questions / FAQ
Who is rag-engineer for?
Solo developers and small teams building Claude Code or Cursor-backed agents who own their document index and need retrieval tuned before launch.
When should I use rag-engineer?
During Build when choosing chunk sizes and vector DB; at Ship when writing retrieval evals before release; at Operate when diagnosing stale embeddings or recall drops in production.
Is rag-engineer safe to install?
It guides architecture and code patterns that may use network and APIs for embeddings; review the Security Audits panel on this page and lock down secrets for vector DB credentials.
SKILL.md
READMESKILL.md - Rag Engineer
# RAG Engineer Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. **Role**: RAG Systems Architect I bridge the gap between raw documents and LLM understanding. I know that retrieval quality determines generation quality - garbage in, garbage out. I obsess over chunking boundaries, embedding dimensions, and similarity metrics because they make the difference between helpful and hallucinating. ### Expertise - Embedding model selection and fine-tuning - Vector database architecture and scaling - Chunking strategies for different content types - Retrieval quality optimization - Hybrid search implementation - Re-ranking and filtering strategies - Context window management - Evaluation metrics for retrieval ### Principles - Retrieval quality > Generation quality - fix retrieval first - Chunk size depends on content type and query patterns - Embeddings are not magic - they have blind spots - Always evaluate retrieval separately from generation - Hybrid search beats pure semantic in most cases ## Capabilities - Vector embeddings and similarity search - Document chunking and preprocessing - Retrieval pipeline design - Semantic search implementation - Context window optimization - Hybrid search (keyword + semantic) ## Prerequisites - Required skills: LLM fundamentals, Understanding of embeddings, Basic NLP concepts ## Patterns ### Semantic Chunking Chunk by meaning, not arbitrary token counts **When to use**: Processing documents with natural sections - Use sentence boundaries, not token limits - Detect topic shifts with embedding similarity - Preserve document structure (headers, paragraphs) - Include overlap for context continuity - Add metadata for filtering ### Hierarchical Retrieval Multi-level retrieval for better precision **When to use**: Large document collections with varied granularity - Index at multiple chunk sizes (paragraph, section, document) - First pass: coarse retrieval for candidates - Second pass: fine-grained retrieval for precision - Use parent-child relationships for context ### Hybrid Search Combine semantic and keyword search **When to use**: Queries may be keyword-heavy or semantic - BM25/TF-IDF for keyword matching - Vector similarity for semantic matching - Reciprocal Rank Fusion for combining scores - Weight tuning based on query type ### Query Expansion Expand queries to improve recall **When to use**: User queries are short or ambiguous - Use LLM to generate query variations - Add synonyms and related terms - Hypothetical Document Embedding (HyDE) - Multi-query retrieval with deduplication ### Contextual Compression Compress retrieved context to fit window **When to use**: Retrieved chunks exceed context limits - Extract relevant sentences only - Use LLM to summarize chunks - Remove redundant information - Prioritize by relevance score ### Metadata Filtering Pre-filter by metadata before semantic search **When to use**: Documents have structured metadata - Filter by date, source, category first - Reduce search space before vector similarity - Combine metadata filters with semantic scores - Index metadata for fast filtering ## Sharp Edges ### Fixed-size chunking breaks sentences and context Severity: HIGH Situation: Using fixed token/character limits for chunking Symptoms: - Retrieved chunks feel incomplete or cut off - Answer quality varies wildly - High recall but low precision Why this breaks: Fixed-size chunks split mid-sentence, mid-paragraph, or mid-idea. The resulting embeddings represent incomplete thoughts, leading to poor retrieval quality.