
Rag Retrieval
Run a structured RAG quality checklist while building or hardening agentic retrieval so answers stay grounded and failures degrade gracefully.
Overview
RAG Retrieval is an agent skill most often used in Build (also Ship testing) that applies a multi-section RAG quality checklist to agentic retrieval and generation pipelines.
Install
npx skills add https://github.com/yonatangross/orchestkit --skill rag-retrievalWhat is this skill?
- Seven checklist areas: retrieval, document grading, query transformation, web fallback, Self-RAG, generation, and errors
- Tuning defaults: 512–1024 token chunks, 10–20% overlap, top-k balance, 2–3 query retries
- Document grading with thresholds plus fallback when relevance is low
- CRAG web search with rate limits, filtering, and source attribution
- Generation guardrails: citations, hallucination checks, adaptive skip-retrieval for simple queries
- 7 major checklist sections (retrieval through error handling)
- 512–1024 token typical chunk size with 10–20% overlap
- 2–3 typical maximum query retry count
Adoption & trust: 604 installs on skills.sh; 183 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent answers from RAG feel confident but wrong because chunking, grading, retries, and fallbacks were never verified as a system.
Who is it for?
Indie builders implementing or reviewing production RAG in coding agents, support bots, or internal knowledge tools.
Skip if: Greenfield projects with no retrieval layer yet—start with ingestion and indexing skills first, then run this checklist.
When should I use this skill?
You are implementing, reviewing, or regressing an agentic RAG pipeline and need a structured quality pass.
What do I get? / Deliverables
You complete a categorized RAG QA pass with tuned retrieval, grading, CRAG/Self-RAG behavior, and documented degradation paths before users rely on answers.
- Completed RAG quality checklist with pass/fail notes per section
- Tuning recommendations for chunking, top-k, grading thresholds, and fallbacks
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Agentic RAG is assembled during build when you wire embeddings, chunking, and retrieval into your product or coding agent. Agent-tooling is the canonical shelf for retrieval pipelines, grading, CRAG web fallback, and Self-RAG control logic.
Where it fits
Configure chunk overlap and metadata filters before exposing a codebase Q&A agent to beta users.
Run grading thresholds and 2–3 query retry limits against a golden set of support questions before release.
Recalibrate Self-RAG skip-retrieval and web fallback after spike in irrelevant corpus hits.
Smoke-test a prototype RAG demo for citation enforcement and graceful degradation on empty retrieval.
How it compares
Checker-style skill package for RAG pipelines, not a hosted vector database or MCP retrieval server.
Common Questions / FAQ
Who is rag-retrieval for?
Solo developers and small teams building LLM agents who need a repeatable audit template for retrieval, grading, and grounded generation.
When should I use rag-retrieval?
In Build (agent-tooling) while wiring embeddings and graders; in Ship (testing) before launch to validate top-k, CRAG fallback, and citation enforcement; in Operate when tuning false positives after user-reported hallucinations.
Is rag-retrieval safe to install?
The skill implies network-backed web search in CRAG sections—review permissions and the Security Audits panel on this page before enabling APIs in your environment.
SKILL.md
READMESKILL.md - Rag Retrieval
# RAG Quality Checklist Quality assurance for agentic RAG implementations. ## Retrieval Quality - [ ] Semantic search configured with appropriate embedding model - [ ] Chunk size optimized (512-1024 tokens typical) - [ ] Chunk overlap configured (10-20% of chunk size) - [ ] Metadata filtering implemented for scoping - [ ] Top-k tuned for precision/recall balance ## Document Grading - [ ] Relevance grading implemented (binary or scored) - [ ] Grading prompt tested with diverse queries - [ ] Threshold tuned for false positive/negative balance - [ ] Fallback behavior defined for low-relevance results ## Query Transformation - [ ] Query rewriting enabled for failed retrievals - [ ] Maximum retry count configured (2-3 typical) - [ ] Query decomposition for multi-concept queries - [ ] HyDE integration for vocabulary mismatch ## Web Fallback (CRAG) - [ ] Web search integration configured - [ ] Rate limiting for web search API - [ ] Result filtering and quality check - [ ] Source attribution for web results ## Self-RAG Patterns - [ ] Adaptive retrieval decision logic implemented - [ ] Reflection tokens for quality assessment - [ ] Skip retrieval path for simple queries - [ ] Confidence thresholds calibrated ## Generation Quality - [ ] Context formatting optimized - [ ] Citation/source attribution enforced - [ ] Hallucination detection enabled - [ ] Output length appropriate ## Error Handling - [ ] Graceful degradation on retrieval failure - [ ] Fallback responses configured - [ ] Retry logic with exponential backoff - [ ] Error logging and alerting ## Performance - [ ] Retrieval latency acceptable (<500ms) - [ ] Caching for repeated queries - [ ] Batch embedding for efficiency - [ ] Async execution where possible ## Monitoring - [ ] Retrieval metrics tracked (precision, recall) - [ ] Query success/failure rates logged - [ ] Web fallback frequency monitored - [ ] User feedback integration # PGVector Hybrid Search Implementation Checklist Use this checklist when implementing semantic + keyword search with PGVector. ## Pre-Implementation ### Index Strategy Planning - [ ] **Choose vector algorithm** - HNSW (recommended) or IVFFlat - [ ] **Select embedding model** - OpenAI (1536), Voyage AI (1024), etc. - [ ] **Determine dimensions** - Match model output dimensions - [ ] **Plan distance metric** - Cosine (most common) or L2/Inner Product - [ ] **Set HNSW parameters** - m=16, ef_construction=64 (good defaults) ### Embedding Model Selection - [ ] **Test embedding quality** - Validate on sample queries - [ ] **Measure embedding latency** - API call time - [ ] **Budget embedding costs** - Track usage for bulk ingestion - [ ] **Plan batch embedding** - Batch API calls for efficiency - [ ] **Cache embeddings** - Store in database, don't re-compute ### RRF Configuration - [ ] **Set fetch multiplier** - 3x (retrieve 30 for top-10 results) - [ ] **Choose RRF constant (k)** - 60 (standard value) - [ ] **Plan score normalization** - Use rank, not raw scores - [ ] **Define boosting factors** - Section title (1.5x), path (1.15x), code (1.2x) - [ ] **Set similarity threshold** - Minimum cosine similarity (e.g., 0.75) ### Schema Design - [ ] **Define chunks table** - id, content, embedding, metadata - [ ] **Add tsvector column** - Pre-computed for keyword search - [ ] **Plan metadata fields** - section_title, section_path, content_type - [ ] **Add timestamps** - created_at, updated_at - [ ] **Foreign keys** - Link to documents/artifacts ## Implementation ### Database Schema ```sql -- 1. Enable pgvector extension CREATE EXTENSION IF NOT EXISTS vector; -- 2. Create chunks table CREATE TABLE chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), document_id UUID REFERENCES documents(id) ON DELETE CASCADE, content TEXT NOT NULL, -- Vector embedding (match model dimensions) embedding vector(1024), -- Voyage AI 1024 dims -- Pre-computed tsvector for full-text search content_tsvector tsvector GENERA