Proofrag

unshDee/proofrag·1 plugin

Install when you ship a RAG or LLM app and need golden sets from your docs, LLM-as-judge scoring, retrieval metrics, shareable scorecards, and CI gates.

Overview

proofrag is a plugin marketplace for the Ship phase that evaluates RAG/LLM apps with golden sets, LLM-as-judge, retrieval metrics, scorecards, and CI gates.

What is this marketplace?

Golden test sets generated from your own documentation corpus
LLM-as-judge evaluation plus retrieval metrics in one skill bundle
Shareable scorecard output for stakeholders and iteration reviews
CI gate support so regressions fail builds (MIT licensed v0.5.2)
Keywords: rag, llm-as-judge, evaluation, retrieval metrics
1 plugin: proofrag version 0.5.2, MIT license
Marketplace metadata describes golden sets, LLM-as-judge, and scorecards

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Community signal: 1 GitHub stars.

What problem does it solve?

RAG demos look fine in chat but nobody measures grounded accuracy, retrieval drift, or regressions before merge.

Who is it for?

Indie builders shipping doc-QA, support bots, or internal copilots who need repeatable evals—not one-off manual spot checks.

Skip if: Static sites with no LLM or retrieval layer, or teams unwilling to maintain golden questions tied to their docs.

What do I get? / Deliverables

After install, you get doc-derived golden sets, judged scores, retrieval metrics, a shareable scorecard, and optional CI failure on quality drops.

Golden evaluation set derived from your docs
LLM-as-judge and retrieval metric results
Shareable scorecard and optional CI gate signal

Plugins in this marketplace

1 plugin — install individually after you add the marketplace.

PluginVersion

ProofragEvaluate a RAG/LLM app: golden set from your docs + LLM-as-judge + retrieval metrics + shareable scorecard + CI gate.0.5.2

Recommended Marketplaces

Anti Halltalas9/anti-hall

anti-hall is a Claude Code plugin marketplace centered on a single verify-first discipline plugin for solo builders who …1 stars

Bee Reviewheresun/SuperReview

bee-review (SuperReview marketplace) is an AI swarm code review plugin for Claude Code, maintained by 焦点科技 / David under…

Bug EchoTerryc21/bug-echo

bug-echo is a Claude Code marketplace plugin by Terry Nyberg that extends debugging beyond the line you just patched. Wh…

Chrome Test Runner Marketplacevictor-qin/chrome-test-runner-plugin

chrome-test-runner-marketplace delivers one chrome-test-runner plugin (0.2.0) for solo builders who ship web apps and ne…

Claude Harnessess-hiraoku/claude-harnesses

claude-harnesses is a ten-plugin Claude Code marketplace from s-hiraoku that packages opinionated shipping harnesses for…

Claude In MobileAlexGladkov/claude-in-mobile

claude-in-mobile is an AlexGladkov Claude marketplace entry that publishes one plugin described as a mobile device autom…286 stars

Journey fit

Primary fit

RAG quality proof belongs on the ship/testing shelf because you run it before trusting production answers and in CI—not while brainstorming positioning. Golden-set evaluation, judge models, and retrieval metrics are test harness concerns for LLM apps, matching ship → testing.

How it compares

RAG/LLM evaluation skill marketplace, not a vector DB or embedding provider integration.

Common Questions / FAQ

Who is Proofrag for?

It is for builders running RAG or LLM apps who want golden-set evaluation, LLM-as-judge, and CI gates inside Claude Code.

When should I use Proofrag?

Use it before release and after chunking, embedder, or model changes to catch retrieval and answer-quality regressions.

How do I add Proofrag to my agent?

Add the unshDee/Proofrag Claude marketplace, enable the Proofrag plugin (MIT v0.5.2), and point it at your app and documentation for golden-set generation.