
Vector Index Tuning
Benchmark HNSW M, ef_construction, and ef_search settings so RAG vector search hits your recall and latency budget before production.
Overview
vector-index-tuning is an agent skill for the Build phase that benchmarks HNSW vector index parameters for recall, latency, and memory before you lock RAG search settings.
Install
npx skills add https://github.com/wshobson/agents --skill vector-index-tuningWhat is this skill?
- Grid-search template over H, ef_construction, and ef_search for hnswlib cosine indexes
- Measures build time, query latency, approximate memory, and recall@k against ground truth
- Parameterized lists default to m ∈ {8,16,32,64} and multiple ef search/construction sweeps
- Python/numpy-oriented workflow for offline tuning before pinning production config
- Worked examples file pairs templates with vector-index-tuning domain knowledge
- Default M sweep: 8, 16, 32, 64
- Default ef_construction sweep: 64, 128, 256
- Default ef_search sweep: 32, 64, 128, 256
Adoption & trust: 6.9k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your embedding index uses default HNSW knobs and you cannot explain recall versus query latency for your real vector volume.
Who is it for?
Indie builders running hnswlib (or similar ANN) behind a self-hosted RAG API who need data-driven index configs.
Skip if: Builders on fully managed vector DBs with vendor auto-tuning only, or teams without ground-truth query labels for recall measurement.
When should I use this skill?
You are implementing vector retrieval and need to sweep HNSW build and search parameters against labeled queries before production.
What do I get? / Deliverables
You get benchmarked parameter combinations with build time, search time, memory estimates, and recall metrics to choose production HNSW settings.
- Benchmark result rows per M / ef_construction / ef_search combination
- Recommended HNSW configuration grounded in measured recall and latency
Recommended Skills
Journey fit
Vector index tuning happens while you implement retrieval and storage in Build, not as a launch-time marketing task. Backend subphase covers embedding indexes, ANN graphs, and search latency—the exact surface this skill’s HNSW benchmarking templates target.
How it compares
Use as an offline tuning playbook instead of guessing ef_search from blog posts or provider default presets.
Common Questions / FAQ
Who is vector-index-tuning for?
Solo developers and small teams implementing semantic search or agent memory who own hnswlib-style indexes and need reproducible parameter sweeps.
When should I use vector-index-tuning?
Use it during Build while wiring retrieval—after you have embedding vectors and a labeled eval set, before you freeze infra config for Ship.
Is vector-index-tuning safe to install?
It is template and benchmark code you run locally; review the Security Audits panel on this page and avoid pointing benchmarks at production secrets or PII-heavy corpora without scrubbing.
SKILL.md
READMESKILL.md - Vector Index Tuning
# vector-index-tuning — templates and worked examples ## Templates ### Template 1: HNSW Parameter Tuning ```python import numpy as np from typing import List, Tuple import time def benchmark_hnsw_parameters( vectors: np.ndarray, queries: np.ndarray, ground_truth: np.ndarray, m_values: List[int] = [8, 16, 32, 64], ef_construction_values: List[int] = [64, 128, 256], ef_search_values: List[int] = [32, 64, 128, 256] ) -> List[dict]: """Benchmark different HNSW configurations.""" import hnswlib results = [] dim = vectors.shape[1] n = vectors.shape[0] for m in m_values: for ef_construction in ef_construction_values: # Build index index = hnswlib.Index(space='cosine', dim=dim) index.init_index(max_elements=n, M=m, ef_construction=ef_construction) build_start = time.time() index.add_items(vectors) build_time = time.time() - build_start # Get memory usage memory_bytes = index.element_count * ( dim * 4 + # Vector storage m * 2 * 4 # Graph edges (approximate) ) for ef_search in ef_search_values: index.set_ef(ef_search) # Measure search search_start = time.time() labels, distances = index.knn_query(queries, k=10) search_time = time.time() - search_start # Calculate recall recall = calculate_recall(labels, ground_truth, k=10) results.append({ "M": m, "ef_construction": ef_construction, "ef_search": ef_search, "build_time_s": build_time, "search_time_ms": search_time * 1000 / len(queries), "recall@10": recall, "memory_mb": memory_bytes / 1024 / 1024 }) return results def calculate_recall(predictions: np.ndarray, ground_truth: np.ndarray, k: int) -> float: """Calculate recall@k.""" correct = 0 for pred, truth in zip(predictions, ground_truth): correct += len(set(pred[:k]) & set(truth[:k])) return correct / (len(predictions) * k) def recommend_hnsw_params( num_vectors: int, target_recall: float = 0.95, max_latency_ms: float = 10, available_memory_gb: float = 8 ) -> dict: """Recommend HNSW parameters based on requirements.""" # Base recommendations if num_vectors < 100_000: m = 16 ef_construction = 100 elif num_vectors < 1_000_000: m = 32 ef_construction = 200 else: m = 48 ef_construction = 256 # Adjust ef_search based on recall target if target_recall >= 0.99: ef_search = 256 elif target_recall >= 0.95: ef_search = 128 else: ef_search = 64 return { "M": m, "ef_construction": ef_construction, "ef_search": ef_search, "notes": f"Estimated for {num_vectors:,} vectors, {target_recall:.0%} recall" } ``` ### Template 2: Quantization Strategies ```python import numpy as np from typing import Optional class VectorQuantizer: """Quantization strategies for vector compression.""" @staticmethod def scalar_quantize_int8( vectors: np.ndarray, min_val: Optional[float] = None, max_val: Optional[float] = None ) -> Tuple[np.ndarray, dict]: """Scalar quantization to INT8.""" if min_val is None: min_val = vectors.min() if max_val is None: max_val = vectors.max() # Scale to 0-255 range scale = 255.0 / (max_val - min_val) quantized = np.clip( np.round((vectors - min_val) * scale), 0, 255 ).astype(np.uint8) params = {"min_val": min_val, "max_val": max_val, "scale": scale} return quantized, params @staticme