
Recsys Pipeline Architect
Scaffold a six-stage ranking pipeline when you need top-K personalization beyond a single relevance score.
Overview
recsys-pipeline-architect is an agent skill for the Build phase that designs and scaffolds six-stage recommendation and ranking pipelines for top-K item selection.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill recsys-pipeline-architectWhat is this skill?
- Six-stage composable pattern: Source → Hydrator → Filter → Scorer → Selector → SideEffect
- Covers feeds, CMS picks, RAG rerankers, task prioritizers, notification triage, and ad ranking
- Scaffolds runnable pipelines in TypeScript, Go, or Python around your scorer
- Supports migrating from one relevance score to multi-action prediction with tunable weights
- Pattern inspired by xAI For You (Apache 2.0); independent MIT reimplementation—no copied upstream code
- Six pipeline stages: Source, Hydrator, Filter, Scorer, Selector, SideEffect
- Scaffold targets: TypeScript, Go, and Python
Adoption & trust: 811 installs on skills.sh; 210k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a scorer or embedding model but no clear, composable pipeline for filters, hydration, selection, and post-rank side effects.
Who is it for?
Indie builders shipping feeds, search reranking, or RAG result ordering who want a repeatable pipeline shape instead of one-off ranking scripts.
Skip if: Teams that only need a static sort in SQL with no personalization, hydration, or multi-stage business rules.
When should I use this skill?
Building any system that picks the top K items for a user or context, asking how to rank X, or wrapping an LLM/ML scorer with pipeline plumbing.
What do I get? / Deliverables
You get a spec-backed pipeline layout plus stack scaffolds so you can plug in scoring and ship a tunable top-K service faster.
- Pipeline stage specification for your ranking use case
- Stack-specific scaffold code hooks for scorer, filters, and side effects
Recommended Skills
Journey fit
Ranking and feed plumbing is core product/backend work while you are building the system, not a launch or ops-only concern. Pipeline stages (source, hydrate, filter, score, select, side-effects) are backend architecture for serving ranked item lists.
How it compares
Use for pipeline architecture and scaffolding—not as a hosted recsys product or a raw vector-database integration skill.
Common Questions / FAQ
Who is recsys-pipeline-architect for?
Solo and indie builders implementing personalized feeds, rerankers, or prioritization APIs who need structured stages around their scorer.
When should I use recsys-pipeline-architect?
During Build when you are defining how to rank items for a user or context—feeds, CMS modules, RAG reranking, notifications, or ads—and need filters plus side effects, not just a score function.
Is recsys-pipeline-architect safe to install?
Review the Security Audits panel on this Prism page and the upstream skill repo before trusting generated scaffold code in production.
SKILL.md
READMESKILL.md - Recsys Pipeline Architect
# recsys-pipeline-architect A spec-and-scaffold skill for building composable recommendation, ranking, and feed pipelines. It encodes the **six-stage pattern** — Source → Hydrator → Filter → Scorer → Selector → SideEffect — popularized by xAI's open-sourced [For You algorithm](https://github.com/xai-org/x-algorithm) (Apache 2.0). This skill is an independent reimplementation of the pattern (MIT) — no code copied from the original. Upstream: <https://github.com/mturac/recsys-pipeline-architect> ## When to Use - User wants to build any system that picks "the top K items for a user/context" - User asks "how should I rank X" or describes a feed/personalization problem - User has a scoring function and needs the pipeline plumbing around it - User wants to migrate from a single relevance score to multi-action prediction with tunable weights - User is wrapping an LLM/ML scorer and needs filters, hydrators, side-effects, and a runnable scaffold in their stack (TypeScript / Go / Python) - Triggers: "recommendation system", "feed algorithm", "ranking pipeline", "for you feed", "candidate pipeline", "content recommender", "pipeline architecture for recsys", "RAG retrieval reranker" ## When NOT to Use - Model architecture work (transformer design, two-tower retrieval, embedding training) — this skill is plumbing *around* the model, not the model itself - Pure ML training pipelines — the scoring function is the user's responsibility - Operating a deployed pipeline (monitoring, autoscaling) — out of scope ## The six-stage framework | # | Stage | Job | Parallel? | |---|---|---|---| | 1 | **Source** | Fetch candidates from one or more origins | Yes — multiple sources run in parallel | | 2 | **Hydrator** | Enrich each candidate with metadata needed for filtering and scoring | Yes — independent hydrators run in parallel | | 3 | **Filter** | Drop candidates that should never be shown (blocked, expired, duplicate, ineligible) | Sequential — each filter sees fewer items | | 4 | **Scorer** | Assign each surviving candidate one or more scores | Sequential — later scorers see earlier scores | | 5 | **Selector** | Sort by final score, return top K | Single op | | 6 | **SideEffect** | Cache served IDs, log impressions, emit events, update counters | Async — must never block the response | ### Why this exact order - Sources before hydration: know what candidates exist before paying to enrich them - Hydration before filtering: many filters need metadata the source did not provide - Filtering before scoring: scoring is the expensive stage; drop the ineligible first - Scorer chain (not single scorer): real systems compose ML scoring + diversity reranking + business rules - Selector after scoring: keeps scoring deterministic and cacheable - SideEffects last and async: side effects must never block the user response ## Workflow when invoked Walk the user through these eight steps: 1. **Clarify the use case** (one round, three questions): items being ranked? input context? language/runtime? 2. **Identify the candidate sources**: usually in-network (followed/owned/subscribed) + out-of-network (ML retrieval / trending / similar-to-liked) 3. **List required hydrations**: for each filter and scorer, what data does it need that the source did not provide? 4. **List the filters**: duplicate, self, age, block/mute, previously-served, eligibility. Order matters — cheap before expensive. 5. **Design the scorer chain**: primary (ML) → combiner (multi-action with weights) → div