
Hybrid Search Implementation
Implement reciprocal rank fusion and weighted linear blends so vector and keyword retrieval rank the same document set coherently in RAG or product search.
Install
npx skills add https://github.com/wshobson/agents --skill hybrid-search-implementationWhat is this skill?
- Reciprocal Rank Fusion template with configurable k constant and per-list weights
- Linear combination helper blending vector similarity scores with BM25 keyword scores via alpha
- Worked Python patterns for fusing multiple ranked (doc_id, score) lists
- Designed for pairing dense embeddings with sparse lexical retrieval in one ranking
Adoption & trust: 7k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Microsoft Foundrymicrosoft/azure-skills
Azure Aimicrosoft/azure-skills
Azure Hosted Copilot Sdkmicrosoft/azure-skills
Lark Eventlarksuite/cli
Running Claude Code Via Litellm Copilotxixu-me/skills
Setup Matt Pocock Skillsmattpocock/skills
Journey fit
Primary fit
Hybrid retrieval is implemented in the application backend once you have embeddings and a text index, not during idea or launch marketing work. Backend covers search services, ranking code, and fusion logic that agents or APIs call at query time.
Common Questions / FAQ
Is Hybrid Search Implementation safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Hybrid Search Implementation
# hybrid-search-implementation — templates and worked examples ## Templates ### Template 1: Reciprocal Rank Fusion ```python from typing import List, Dict, Tuple from collections import defaultdict def reciprocal_rank_fusion( result_lists: List[List[Tuple[str, float]]], k: int = 60, weights: List[float] = None ) -> List[Tuple[str, float]]: """ Combine multiple ranked lists using RRF. Args: result_lists: List of (doc_id, score) tuples per search method k: RRF constant (higher = more weight to lower ranks) weights: Optional weights per result list Returns: Fused ranking as (doc_id, score) tuples """ if weights is None: weights = [1.0] * len(result_lists) scores = defaultdict(float) for result_list, weight in zip(result_lists, weights): for rank, (doc_id, _) in enumerate(result_list): # RRF formula: 1 / (k + rank) scores[doc_id] += weight * (1.0 / (k + rank + 1)) # Sort by fused score return sorted(scores.items(), key=lambda x: x[1], reverse=True) def linear_combination( vector_results: List[Tuple[str, float]], keyword_results: List[Tuple[str, float]], alpha: float = 0.5 ) -> List[Tuple[str, float]]: """ Combine results with linear interpolation. Args: vector_results: (doc_id, similarity_score) from vector search keyword_results: (doc_id, bm25_score) from keyword search alpha: Weight for vector search (1-alpha for keyword) """ # Normalize scores to [0, 1] def normalize(results): if not results: return {} scores = [s for _, s in results] min_s, max_s = min(scores), max(scores) range_s = max_s - min_s if max_s != min_s else 1 return {doc_id: (score - min_s) / range_s for doc_id, score in results} vector_scores = normalize(vector_results) keyword_scores = normalize(keyword_results) # Combine all_docs = set(vector_scores.keys()) | set(keyword_scores.keys()) combined = {} for doc_id in all_docs: v_score = vector_scores.get(doc_id, 0) k_score = keyword_scores.get(doc_id, 0) combined[doc_id] = alpha * v_score + (1 - alpha) * k_score return sorted(combined.items(), key=lambda x: x[1], reverse=True) ``` ### Template 2: PostgreSQL Hybrid Search ```python import asyncpg from typing import List, Dict, Optional import numpy as np class PostgresHybridSearch: """Hybrid search with pgvector and full-text search.""" def __init__(self, pool: asyncpg.Pool): self.pool = pool async def setup_schema(self): """Create tables and indexes.""" async with self.pool.acquire() as conn: await conn.execute(""" CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE IF NOT EXISTS documents ( id TEXT PRIMARY KEY, content TEXT NOT NULL, embedding vector(1536), metadata JSONB DEFAULT '{}', ts_content tsvector GENERATED ALWAYS AS ( to_tsvector('english', content) ) STORED ); -- Vector index (HNSW) CREATE INDEX IF NOT EXISTS documents_embedding_idx ON documents USING hnsw (embedding vector_cosine_ops); -- Full-text index (GIN) CREATE INDEX IF NOT EXISTS documents_fts_idx ON documents USING gin (ts_content); """) async def hybrid_search( self, query: str, query_embedding: List[float], limit: int = 10, vector_weight: float = 0.5, filter_metadata: Optional[Dict] = None ) -> List[Dict]: """ Perform hybrid search combining vector and full-text. Uses RRF fusion for combining results. """ async with self.pool.acquire() as conn: