
Recsys Pipeline Architect
Scaffold a six-stage Source→Hydrator→Filter→Scorer→Selector→SideEffect pipeline for any top-K ranking or feed without copying xAI’s For You codebase.
Install
npx skills add https://github.com/wshobson/agents --skill recsys-pipeline-architectWhat is this skill?
- Six-stage composable pattern: Source, Hydrator, Filter, Scorer, Selector, SideEffect
- Applies to feeds, search ranking, RAG rerankers, task prioritizers, notification triage, and ad selection
- Independent MIT-style reimplementation inspired by xAI’s open For You algorithm shape—not copied code
- Universal “top K for (user, context)” plumbing separate from the scoring function
- Guidance for wrapping LLM/ML scorers in production-grade pipeline stages
Adoption & trust: 604 installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Microsoft Foundrymicrosoft/azure-skills
Azure Aimicrosoft/azure-skills
Azure Hosted Copilot Sdkmicrosoft/azure-skills
Lark Eventlarksuite/cli
Running Claude Code Via Litellm Copilotxixu-me/skills
Setup Matt Pocock Skillsmattpocock/skills
Journey fit
Primary fit
Ranking and feed architecture is designed and implemented while building backend and recommendation logic. Composable pipeline specs, interfaces, and scaffolds belong in backend systems design for feeds, search, and RAG rerankers.
Common Questions / FAQ
Is Recsys Pipeline Architect safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Recsys Pipeline Architect
# Recsys Pipeline Architect A spec-and-scaffold skill for building composable recommendation, ranking, and feed pipelines. Encodes the six-stage pattern popularized by xAI's open-sourced [For You algorithm](https://github.com/xai-org/x-algorithm) (Apache 2.0) and applies it to any "top K for (user, context)" problem. ## Overview Most "recommendation systems" in production aren't exotic ML — they're *pipelines*: fetch candidates from one or more sources, enrich them with metadata, drop the ineligible, score the rest, sort and pick the top K, then fire async side effects. The pattern is universal. The scoring function and the items change; the pipeline shape doesn't. This skill is an independent reimplementation of the pattern (MIT) — no code copied from the original. ## When to Use This Skill - Building any system that returns "the top K items for a user/context" - Designing or refactoring a personalized feed (content, search results, notifications) - Wrapping an LLM/ML scorer in proper pipeline plumbing (sources, hydration, filters, side effects) - Adding multi-action prediction with tunable weights (instead of a single relevance score) - Building a RAG retrieval reranker (cheap retrieval → expensive rerank) - Designing a task prioritizer or alert triage system ## The Six-Stage Framework | # | Stage | Job | Parallel? | |---|---|---|---| | 1 | **Source** | Fetch candidates from one or more origins | Yes — multiple sources run in parallel | | 2 | **Hydrator** | Enrich candidates with metadata needed for filtering and scoring | Yes — independent hydrators run in parallel | | 3 | **Filter** | Drop ineligible candidates (blocked, expired, duplicate, ineligible) | Sequential — each filter sees fewer items | | 4 | **Scorer** | Assign each surviving candidate one or more scores | Sequential — later scorers see earlier scores | | 5 | **Selector** | Sort by final score, return top K | Single op | | 6 | **SideEffect** | Cache, log, emit events, update served-history | Async — must never block the response | ### Why this exact order - Sources before hydration: know what candidates exist before paying to enrich - Hydration before filtering: many filters need metadata the source didn't provide - Filtering before scoring: scoring is the expensive stage — drop the ineligible first - Scorer chain (not single scorer): real systems compose ML scoring + diversity reranking + business rules - Selector after scoring: keeps scoring deterministic and cacheable - SideEffects last and async: side effects must never block the user response ## Workflow When Invoked Walk the user through eight steps: 1. **Clarify the use case** (one round, three questions only if missing): items being ranked, input context, language/runtime 2. **Identify the candidate sources** (usually in-network + out-of-network, but single-source also valid) 3. **List required hydrations** — for each filter and scorer, what data does it need that the source didn't provide? 4. **List the filters** — cheap before expensive, universal before user-specific (duplicate, self, age, block/mute, previously-served, eligibility) 5. **Design the scorer chain** — primary ML/heuristic → combiner (multi-action with weights) → diversity → business rules 6. **Selector** — sort descending by final score, take top K (or stratified mix) 7. **SideEffects** — cache served IDs, emit impression events, update counters, log analytics; all fire-and-forget 8. **Generate the scaffold** in the user's stack ## Key Trade-offs to Surface Never default silently on these — they a