
Rag Implementation
Implement production RAG retrievers—hybrid BM25+vectors, multi-query expansion, and contextual compression—with LangChain-style patterns.
Overview
rag-implementation is an agent skill most often used in Build (also Validate prototype) that teaches hybrid, multi-query, and compression RAG retriever patterns for LLM applications.
Install
npx skills add https://github.com/wshobson/agents --skill rag-implementationWhat is this skill?
- Hybrid search with BM25 + dense vectors and Reciprocal Rank Fusion weighting (example 0.3 / 0.7)
- Multi-query retriever pattern to improve recall from a single user question
- Contextual compression retriever with LLMChainExtractor for smaller contexts
- Worked Python examples using LangChain retriever APIs
- Advanced RAG pattern catalog beyond naive vector-only search
- Hybrid ensemble example uses BM25 k=10 and dense k=10 with 0.3/0.7 RRF-style weights
- Documents three advanced patterns: hybrid RRF, multi-query, contextual compression
Adoption & trust: 9.1k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your RAG stack only does vanilla vector search and returns irrelevant chunks that blow up context windows and hurt answer quality.
Who is it for?
Builders shipping agent or SaaS features with Python LangChain stacks who need concrete retriever recipes beyond embed-and-query.
Skip if: Teams wanting a hosted vector DB tutorial only, or products with no document corpus to index.
When should I use this skill?
User is implementing or improving RAG retrieval (hybrid search, multi-query, compression) for an LLM or agent feature.
What do I get? / Deliverables
You implement hybrid RRF retrieval, optional query rewriting, and compression so agents retrieve sharper evidence with controllable token use.
- Hybrid and multi-query retriever configuration
- Compression retriever pipeline reducing prompt context
- Runnable pattern snippets adapted to your stack
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Retrieval architecture is chosen when building agent features and knowledge APIs—canonical shelf is Build agent-tooling. Content covers retriever composition and chunk compression for LLM apps, not generic CRUD backend alone.
Where it fits
Compare naive vector-only recall vs hybrid RRF on a sample FAQ corpus before committing to architecture.
Wire EnsembleRetriever with BM25 and dense weights for a customer-support agent tool.
Add contextual compression so lifecycle emails and in-app help use shorter, relevant excerpts.
How it compares
Skill-delivered RAG architecture patterns, not a managed vector database product or single MCP connector.
Common Questions / FAQ
Who is rag-implementation for?
Solo developers and indie AI products wiring retrieval into agents or APIs, especially with LangChain-style Python codebases.
When should I use rag-implementation?
In Validate prototype when testing recall on real docs; in Build agent-tooling when choosing retrievers; in Grow when improving support-bot answer quality with better chunk selection.
Is rag-implementation safe to install?
Review the Security Audits panel on this Prism page; the skill is documentation and code patterns—you still must secure your own API keys, indexes, and LLM calls.
SKILL.md
READMESKILL.md - Rag Implementation
# rag-implementation — detailed patterns and worked examples ## Advanced RAG Patterns ### Pattern 1: Hybrid Search with RRF ```python from langchain_community.retrievers import BM25Retriever from langchain.retrievers import EnsembleRetriever # Sparse retriever (BM25 for keyword matching) bm25_retriever = BM25Retriever.from_documents(documents) bm25_retriever.k = 10 # Dense retriever (embeddings for semantic search) dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10}) # Combine with Reciprocal Rank Fusion weights ensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, dense_retriever], weights=[0.3, 0.7] # 30% keyword, 70% semantic ) ``` ### Pattern 2: Multi-Query Retrieval ```python from langchain.retrievers.multi_query import MultiQueryRetriever # Generate multiple query perspectives for better recall multi_query_retriever = MultiQueryRetriever.from_llm( retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), llm=llm ) # Single query → multiple variations → combined results results = await multi_query_retriever.ainvoke("What is the main topic?") ``` ### Pattern 3: Contextual Compression ```python from langchain.retrievers import ContextualCompressionRetriever from langchain.retrievers.document_compressors import LLMChainExtractor # Compressor extracts only relevant portions compressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10}) ) # Returns only relevant parts of documents compressed_docs = await compression_retriever.ainvoke("specific query") ``` ### Pattern 4: Parent Document Retriever ```python from langchain.retrievers import ParentDocumentRetriever from langchain.storage import InMemoryStore from langchain_text_splitters import RecursiveCharacterTextSplitter # Small chunks for precise retrieval, large chunks for context child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50) parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) # Store for parent documents docstore = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=docstore, child_splitter=child_splitter, parent_splitter=parent_splitter ) # Add documents (splits children, stores parents) await parent_retriever.aadd_documents(documents) # Retrieval returns parent documents with full context results = await parent_retriever.ainvoke("query") ``` ### Pattern 5: HyDE (Hypothetical Document Embeddings) ```python from langchain_core.prompts import ChatPromptTemplate class HyDEState(TypedDict): question: str hypothetical_doc: str context: list[Document] answer: str hyde_prompt = ChatPromptTemplate.from_template( """Write a detailed passage that would answer this question: Question: {question} Passage:""" ) async def generate_hypothetical(state: HyDEState) -> HyDEState: """Generate hypothetical document for better retrieval.""" messages = hyde_prompt.format_messages(question=state["question"]) response = await llm.ainvoke(messages) return {"hypothetical_doc": response.content} async def retrieve_with_hyde(state: HyDEState) -> HyDEState: """Retrieve using hypothetical document.""" # Use hypothetical doc for retrieval instead of original query docs = await retriever.ainvoke(state["hypothetical_doc"]) return {"context": docs} # Build HyDE RAG graph builder = StateGraph(HyDEState) builder.add_node("hypothetical", generate_hypothetical) builder.add_node("retrieve", retrieve_with_hyde) builder.add_node("generate", generate) builder.add_edge(START, "hypothetical") builder.add_edge("hypothetical", "retrieve") builder.add_edge("retrieve", "generate") builder.add_edge("generate", END) hyde_rag = builder.compile() ``` ## Document Chunking Strategies ### Recursive Character Text Splitter ```p