Rag Retrieval

Agentic RAG is assembled during build when you wire embeddings, chunking, and retrieval into your product or coding agent. Agent-tooling is the canonical shelf for retrieval pipelines, grading, CRAG web fallback, and Self-RAG control logic.

Also useful

Also useful

Where it fits

Example use

Configure chunk overlap and metadata filters before exposing a codebase Q&A agent to beta users.

Example use

Run grading thresholds and 2–3 query retry limits against a golden set of support questions before release.

Example use

ValidatePrototype & spike

Recalibrate Self-RAG skip-retrieval and web fallback after spike in irrelevant corpus hits.

Example use

Smoke-test a prototype RAG demo for citation enforcement and graceful degradation on empty retrieval.

How it compares

Checker-style skill package for RAG pipelines, not a hosted vector database or MCP retrieval server.

Common Questions / FAQ

Who is rag-retrieval for?

Solo developers and small teams building LLM agents who need a repeatable audit template for retrieval, grading, and grounded generation.

When should I use rag-retrieval?

In Build (agent-tooling) while wiring embeddings and graders; in Ship (testing) before launch to validate top-k, CRAG fallback, and citation enforcement; in Operate when tuning false positives after user-reported hallucinations.

Is rag-retrieval safe to install?

The skill implies network-backed web search in CRAG sections—review permissions and the Security Audits panel on this page before enabling APIs in your environment.

SKILL.md

READMESKILL.md - Rag Retrieval

# RAG Quality Checklist

Quality assurance for agentic RAG implementations.

## Retrieval Quality

- [ ] Semantic search configured with appropriate embedding model
- [ ] Chunk size optimized (512-1024 tokens typical)
- [ ] Chunk overlap configured (10-20% of chunk size)
- [ ] Metadata filtering implemented for scoping
- [ ] Top-k tuned for precision/recall balance

## Document Grading

- [ ] Relevance grading implemented (binary or scored)
- [ ] Grading prompt tested with diverse queries
- [ ] Threshold tuned for false positive/negative balance
- [ ] Fallback behavior defined for low-relevance results

## Query Transformation

- [ ] Query rewriting enabled for failed retrievals
- [ ] Maximum retry count configured (2-3 typical)
- [ ] Query decomposition for multi-concept queries
- [ ] HyDE integration for vocabulary mismatch

## Web Fallback (CRAG)

- [ ] Web search integration configured
- [ ] Rate limiting for web search API
- [ ] Result filtering and quality check
- [ ] Source attribution for web results

## Self-RAG Patterns

- [ ] Adaptive retrieval decision logic implemented
- [ ] Reflection tokens for quality assessment
- [ ] Skip retrieval path for simple queries
- [ ] Confidence thresholds calibrated

## Generation Quality

- [ ] Context formatting optimized
- [ ] Citation/source attribution enforced
- [ ] Hallucination detection enabled
- [ ] Output length appropriate

## Error Handling

- [ ] Graceful degradation on retrieval failure
- [ ] Fallback responses configured
- [ ] Retry logic with exponential backoff
- [ ] Error logging and alerting

## Performance

- [ ] Retrieval latency acceptable (<500ms)
- [ ] Caching for repeated queries
- [ ] Batch embedding for efficiency
- [ ] Async execution where possible

## Monitoring

- [ ] Retrieval metrics tracked (precision, recall)
- [ ] Query success/failure rates logged
- [ ] Web fallback frequency monitored
- [ ] User feedback integration


# PGVector Hybrid Search Implementation Checklist

Use this checklist when implementing semantic + keyword search with PGVector.

## Pre-Implementation

### Index Strategy Planning
- [ ] **Choose vector algorithm** - HNSW (recommended) or IVFFlat
- [ ] **Select embedding model** - OpenAI (1536), Voyage AI (1024), etc.
- [ ] **Determine dimensions** - Match model output dimensions
- [ ] **Plan distance metric** - Cosine (most common) or L2/Inner Product
- [ ] **Set HNSW parameters** - m=16, ef_construction=64 (good defaults)

### Embedding Model Selection
- [ ] **Test embedding quality** - Validate on sample queries
- [ ] **Measure embedding latency** - API call time
- [ ] **Budget embedding costs** - Track usage for bulk ingestion
- [ ] **Plan batch embedding** - Batch API calls for efficiency
- [ ] **Cache embeddings** - Store in database, don't re-compute

### RRF Configuration
- [ ] **Set fetch multiplier** - 3x (retrieve 30 for top-10 results)
- [ ] **Choose RRF constant (k)** - 60 (standard value)
- [ ] **Plan score normalization** - Use rank, not raw scores
- [ ] **Define boosting factors** - Section title (1.5x), path (1.15x), code (1.2x)
- [ ] **Set similarity threshold** - Minimum cosine similarity (e.g., 0.75)

### Schema Design
- [ ] **Define chunks table** - id, content, embedding, metadata
- [ ] **Add tsvector column** - Pre-computed for keyword search
- [ ] **Plan metadata fields** - section_title, section_path, content_type
- [ ] **Add timestamps** - created_at, updated_at
- [ ] **Foreign keys** - Link to documents/artifacts

## Implementation

### Database Schema

```sql
-- 1. Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- 2. Create chunks table
CREATE TABLE chunks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID REFERENCES documents(id) ON DELETE CASCADE,
    content TEXT NOT NULL,

    -- Vector embedding (match model dimensions)
    embedding vector(1024),  -- Voyage AI 1024 dims

    -- Pre-computed tsvector for full-text search
    content_tsvector tsvector GENERA

What is this skill?

Seven checklist areas: retrieval, document grading, query transformation, web fallback, Self-RAG, generation, and errors

Tuning defaults: 512–1024 token chunks, 10–20% overlap, top-k balance, 2–3 query retries

Document grading with thresholds plus fallback when relevance is low

CRAG web search with rate limits, filtering, and source attribution

Generation guardrails: citations, hallucination checks, adaptive skip-retrieval for simple queries

7 major checklist sections (retrieval through error handling)

512–1024 token typical chunk size with 10–20% overlap

2–3 typical maximum query retry count

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 604 installs on skills.sh; 183 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Configure chunk overlap and metadata filters before exposing a codebase Q&A agent to beta users.

Example use

Run grading thresholds and 2–3 query retry limits against a golden set of support questions before release.

Example use