
orchestra-research/ai-research-skills
98 skills590 installs926k starsGitHub
Install
npx skills add https://github.com/orchestra-research/ai-research-skillsSkills in this repo
1Ml Paper WritingML Paper Writing is a reference skill for solo researchers and small lab teams preparing submissions to major ML and AI conferences. It documents mandatory checklist rules across NeurIPS (including sixteen required items such as claims alignment and a dedicated limitations section), ICML, ICLR, and ACL, and consolidates a universal pre-submission checklist. NeurIPS explicitly places the checklist after references and supplemental material, outside the page limit, and missing it triggers desk rejection. The skill cross-links a separate systems-paper-writing path for systems venues. Use it when an agent is drafting or reviewing LaTeX manuscripts so abstract and introduction claims stay falsifiable, limitations are substantive, and venue-specific boxes are checked before upload. It does not write experiments or figures—it enforces submission compliance and rhetorical discipline aligned with peer review norms.492installs2Experiment Tracking SwanlabExperiment Tracking SwanLab is a focused integration skill for solo builders and small research-heavy teams training models in Python who need dependable experiment logs without building a custom dashboard. It walks through SwanLab initialization with project and experiment names, logging training metrics during loops, and finishing runs cleanly—patterns that mirror the public SwanLab documentation. The PyTorch examples cover config blocks for learning rate, batch size, epochs, and hidden size, plus periodic `swanlab.log` calls for loss and step counters. A lightweight callback wrapper shows how to encapsulate tracker setup so agents can drop logging into existing trainers consistently. Use it while you are actively coding training jobs in the Build phase, not as a launch or growth analytics replacement for product telemetry.2installs3Academic PlottingAcademic Plotting is a research-oriented agent skill that encodes how to make machine-learning figures look like they belong in a serious venue—not a default seaborn theme from a tutorial. It gives solo researchers and indie ML builders a single setup block for matplotlib and seaborn with Times-friendly serif sizing, 300 DPI export, restrained grids, and an Ocean Dusk palette meant to read clearly in PDFs. The skill is essentially a procedural style guide plus figure patterns: bar comparisons, line trends, and annotation habits suited to evaluation sections and ablation tables. Use it when you are drafting camera-ready plots for arXiv, a workshop paper, or an internal benchmark report and want the agent to emit consistent, quotable styling code rather than reinventing aesthetics each session. Intermediate complexity; assumes Python plotting stack installed locally or in your notebook environment.1installs4Ara CompilerARA Compiler is an agent skill that encodes the ARA directory schema—a layered layout for AI research repositories spanning manifest, logic, source stubs, trace DAGs, and indexed evidence. Solo and indie builders running agent-assisted research use it when they need a consistent, citable structure instead of ad-hoc folders and half-documented experiments. The reference walks field-by-field through problem statements, claims, architecture, algorithms, configs, and raw result tables so both humans and coding agents know where to read and write. It fits early journey work when you are still proving ideas and documenting rigor, and it carries into Build when you formalize docs and execution modules. Pair it with your actual experiment code and review workflows; it does not run training or collect metrics by itself.1installs5Ara Research ManagerARA Research Manager is a journey-wide agent skill for builders and researchers who treat an AI coding session as a lab notebook. Whenever conversation or code changes imply a new question, committed decision, benchmark run, abandoned approach, or evidence-driven pivot, the skill tells the agent how to classify the moment and which file to update—exploration_tree.yaml for the branching trace of trials, and logic/ markdown for claims, heuristics, concepts, and constraints. That split keeps ephemeral search paths separate from statements you want to reuse in specs or papers. Solo builders training custom models, probing architectures, or running systematic ablations benefit because dead ends and pivots do not vanish in chat scrollback. The skill is procedural routing metadata, not a hosted experiment platform; your repo must already use or adopt the Orchestra-style trace and logic layout. Use it from early Idea research through Operate iteration whenever you want session observability without manually curating notes after every agent turn.1installs6Ara Rigor ReviewerARA Rigor Reviewer is an agent skill for solo builders and small research teams who publish structured research artifacts with explicit claims and experiments. After Level 1 structural validation passes, it runs a Level 2 semantic review across six epistemic dimensions—starting with whether cited experiments substantively support each claim and whether falsifiability statements are meaningful. It is meant for ARA-style documents where references alone are not enough and you need type-appropriate evidence (for example ablations for causal claims or heterogeneous setups for generalization). Use it when a draft looks complete but you still distrust the claim–evidence graph, before sharing externally or folding conclusions into product specs. The output mindset is scored dimensions plus major/suggestion findings you can fix in place, not a generic rewrite. It pairs well with agentic research workflows where another skill assembled the ARA and you need a disciplined second reader.1installs7Audiocraft Audio GenerationAudiocraft-audio-generation is a procedural skill for solo builders adding AI music or sound generation to a product using Meta's AudioCraft stack, especially MusicGen fine-tuning. It focuses on the unglamorous but failure-prone step: turning a folder of raw audio plus descriptions into a training-ready corpus at 32 kHz mono with consistent filenames and metadata. The guide includes Python using torchaudio for resampling and channel mixing, and explains the expected output_dir layout your training job should consume. Use it when you already chose AudioCraft and need repeatable dataset prep or when an agent keeps generating incompatible sample rates or stereo tensors. It is intermediate complexity—comfort with Python environments and GPU training assumptions helps. It does not replace hosting, licensing review for training data, or production serving; pair it with your inference and Ship-phase testing plans separately.1installs8Autogpt AgentsAutoGPT Agents documents advanced usage for solo builders and small teams extending the AutoGPT platform with custom blocks rather than only chaining stock nodes. It walks through defining MyBlockInput and MyBlockOutput schemas, subclassing Block with id, name, description, and block_type, implementing async execute that yields keyed results, and registering the class in the backend blocks package. The guide also covers blocks that need stored API credentials through the integrations provider layer. Use it when you are building research or automation agents on AutoGPT and need repeatable patterns for new capabilities—search, enrichment, custom tools—that appear in the visual workflow editor and run server-side with typed I/O.1installs9Autoresearchautoresearch is an Orchestra Research agent skill that solves the most common failure mode of autonomous research: the agent finishes one experiment cycle and stops. It instructs you to configure a fixed-interval wall-clock loop—documented for Claude Code via /loop every 20 minutes—that re-injects a short continuity prompt independent of how fast individual training or eval jobs run. On every tick the agent reloads research-state.yaml and findings.md, checks whether the current experiment finished, errored, or stalled, and either resumes inline work or steps back to fix the pipeline. Solo builders running overnight literature sweeps, hyperparameter searches, or multi-day agent benchmarks should treat this as journey-wide process knowledge: the same continuity pattern applies while you are still framing questions in Idea, scoping in Validate, or executing pipelines in Build. It is not a task integration; it is procedural glue that pairs with whatever research skills you already use, as long as you maintain the expected state artifacts.1installs10Awq QuantizationAWQ quantization is an advanced agent skill for solo builders shipping LLM-powered products who need smaller, faster models without full retraining. It walks through how Activation-aware Weight Quantization protects salient channels, why it often generalizes better than GPTQ with modest calibration data, and how to select AutoAWQ WQLinear kernels (GEMM vs GEMV) for your workload. Use it when you are compressing weights for local or hosted inference, balancing VRAM, latency, and quality on Claude Code, Codex, or Cursor-driven implementation tasks. The guide is reference-heavy—formulas, tables, and config snippets—so it pairs well with an existing Hugging Face or vLLM stack rather than teaching ML from zero. Expect intermediate-to-advanced comfort with PyTorch, GPUs, and model serving paths.1installs11Axolotlaxolotl is a research-heavy agent skill that packages Orchestra’s Axolotl API digest so solo builders can fine-tune open models without tab-hopping across dozens of doc pages. It targets the Build phase when you are wiring custom trainers, cloud runners, or hub push behavior on top of Hugging Face ecosystems, and when you need exact class and function names for Modal or AxolotlTrainer overrides. The readme is structured as linked sections with signature snippets—useful for Cursor or Claude Code to ground refactors, config fixes, or new training scripts. It does not run training for you; it shortens the path from “which Axolotl hook do I override?” to the right module. Advanced users iterating custom datasets, tags, and metric storage benefit most; casual prompt-only builders can skip it.1installs12Blip 2 Vision LanguageBLIP-2 Vision Language is an agent skill that walks solo builders through advanced BLIP-2 usage: loading Salesforce checkpoints, applying PEFT LoRA on the language head, or isolating trainable parameters to the Q-Former. It targets indie teams shipping multimodal features—product galleries, accessibility alt text, visual search, or agent tools that must describe screenshots—who already chose BLIP-2 over a hosted API. The guide favors practical PyTorch snippets you can paste into a training script, including parameter accounting so you know VRAM and time tradeoffs before a long run. Use it during the build phase when caption quality or domain vocabulary from a generic checkpoint is not good enough and you need a controlled fine-tune instead of prompt-only workarounds. It pairs naturally with broader Unsloth or distillation skills if you later compress or specialize models, but this skill stays focused on BLIP-2 architecture specifics.1installs13Brainstorming Research Ideasbrainstorming-research-ideas is an agent skill for researchers and builder-researchers who need disciplined creativity instead of unstructured brainstorming. It packages ten complementary ideation frameworks that target different cognitive modes so you can explore a new field, unstick a current thread, stress-test a half-formed hypothesis, or prepare a group session. The skill sits early in the solo journey when the goal is problem discovery and strategy, not implementation. It deliberately stops before methodology design or literature synthesis, pointing those needs to domain skills or literature-review workflows. Multi-phase value shows up when you revisit ideation while validating scope or pivoting after initial results. For Prism audiences shipping AI or R&D-heavy products, it helps find high-leverage research entry points before you invest build time.1installs14Chromachroma is an integration-oriented agent skill for solo builders adding memory to LLM apps without standing up a proprietary vector SaaS on day one. It walks through LangChain’s Chroma wrapper for document ingestion, OpenAI embeddings, persisted directories, top-k similarity search, and retriever wiring, plus LlamaIndex’s ChromaVectorStore atop chromadb PersistentClient collections. The skill positions Chroma as an open-source embedding database with metadata filters, full-text and vector search, and a small API surface that grows from local notebooks toward production clusters. Use it when you are implementing RAG, semantic document retrieval, or agent tools that need durable vector stores alongside frameworks you already chose. Dependencies listed include chromadb and sentence-transformers for local embedding options.1installs15Clipclip from the ai-research-skills collection gives solo builders concrete PyTorch recipes for OpenAI CLIP beyond paper theory: zero-shot categorization with natural-language labels, offline indexing of image embeddings, and text-to-image retrieval with normalized features. It assumes you will load a standard checkpoint, tokenize prompt sets, run no_grad inference, and rank similarities for search UX or internal tooling. The skill fits indie teams adding visual search, auto-tagging, or moderation assist to SaaS dashboards, creative catalogs, or agent tools that need lightweight multimodal routing without training custom classifiers first. Complexity is advanced because you must manage GPU/CPU torch installs, preprocessing consistency, and prompt engineering; it does not replace MLOps deploy skills for production serving at scale.1installs16Constitutional AiConstitutional AI is an Orchestra Research agent skill that explains and operationalizes Anthropic’s method for training harmless language models through self-improvement. Solo builders and small labs experimenting with custom agents or fine-tunes get a structured path: first supervised learning where the model critiques and revises its own answers against a written constitution of principles, then reinforcement learning using RLAIF—reinforcement from AI feedback—so harmlessness improves without scaling human labels for every bad output. The SKILL.md orients you with quick-start concepts, example constitutional rules, and workflow snippets that assume a Python ML stack. It is research- and implementation-heavy rather than a one-shot prompt template, so it suits builders who already run training jobs and want alignment literacy comparable to what underpins Claude’s safety story. Use when you are designing alignment evals, prototyping safer fine-tunes, or teaching an agent the mechanics of critique-and-revise data generation.1installs17Creative Thinking For ResearchCreative-thinking-for-research is an agent skill that helps solo builders and small lab teams produce genuinely novel computer-science and AI research ideas by applying eight creativity frameworks grounded in cognitive science rather than waiting for inspiration. Each framework targets a distinct operation—combining concepts, reformulating problems, analogizing across fields, manipulating constraints, inverting assumptions, abstracting structure, probing boundaries, and sustaining productive contradictions—so ideation is systematic instead of ad hoc brainstorming. Use it when you feel trapped in incremental extensions, want to bridge subfields with structural connections, or prepare for a retreat-level ideation session. The skill explicitly tells you not to use it when you need structured project-level brainstorming workflows for shipping features; it optimizes for research direction novelty, not sprint planning artifacts. Advanced readers gain vocabulary tied to Koestler-style bisociation and adjacent-possible exploration while staying in SKILL.md procedural form for Claude Code and similar agents.1installs18Crewai Multi AgentCrewAI Multi-Agent (Flows guide) is an agent skill for builders who outgrow simple Crew collaboration and need event-driven orchestration with explicit state and conditional paths. It explains when to stay with Crews—for straightforward sequential or hierarchical multi-agent work—and when to adopt Flows for branching, durable state, and hybrid pipelines that still call Crews inside steps. The skill walks through creating a Flow with a Pydantic state model, wiring entry points with @start, chaining work with @listen, and routing decisions with @router plus combinator helpers. Runnable snippets show kickoff, reading final state, and parallel start methods where appropriate. Solo developers using Python to prototype autonomous research or operations agents install it while scaffolding control flow before adding retries, observability, and deploy wrappers. It assumes comfort with CrewAI’s Python API rather than no-code agent builders, and complements research-oriented stacks documented in the same catalog family.1installs19DeepspeedThe deepspeed skill in ai-research-skills is a compact research digest drawn from official DeepSpeed announcements, chiefly MoE-focused posts from 2021. It gives solo builders and indie ML hackers quotable context on why mixture-of-experts matters for natural language generation, how larger sparse models relate to quality, and what kind of training-cost narratives Microsoft Research published—without embedding install scripts or ZeRO config recipes in the excerpted pages. Reach for it during backend ML planning when you need agent-readable background before choosing DeepSpeed, writing a training budget, or comparing dense versus MoE approaches. It pairs poorly with pure application frontend work; treat it as evidence and terminology support, then follow current DeepSpeed documentation for executable steps.1installs20Distributed Llm Pretraining TorchtitanDistributed LLM Pretraining TorchTitan (as documented in this skill) is an advanced reference for builders running Meta’s TorchTitan pretraining stack who need fault-tolerant checkpoints without guessing TOML keys. It explains how TorchTitan leans on PyTorch Distributed Checkpoint for interoperable saves, how to shrink artifacts with model-only exports, and how to resume training when you change data loading or schedulers via exclude_from_loading. Pipeline parallelism workflows get explicit coverage through seed checkpoint creation on a single CPU with all parallelism degrees pinned to one—critical for reproducible init before scaling out. Async checkpointing options help solo researchers and small labs cut step-time overhead on long runs. This is niche compared to app-level agent skills: you should already have a CONFIG_FILE, cluster access, and tolerance for distributed training ops. Use it when intervals, resume semantics, or PP seeding block your train job—not when you only need inference hosting or fine-tuning UX.1installs21DspyDSPy is a skill-sized pattern library for turning language-model behavior into composable Python modules that solo builders can test, optimize, and ship. It starts with a minimal RAG flow: retrieve top-k passages, join them into context, and run a ChainOfThought signature to produce an answer—then shows how to wire a real vector store through ChromadbRM and global settings. The optimized RAG section introduces BootstrapFewShot with labeled Examples and a correctness metric, which is the bridge from demo prompts to measurable iteration. Additional sections in the source material walk agent systems, classification, data processing, and multi-stage pipelines—useful when your agent product needs more than one LM call in sequence. Reach for this skill when you are past raw API prompts and want signatures, modules, and teleprompters that an coding agent can extend. It assumes comfort with Python and an existing corpus or labels for optimization; it is not a hosted vector DB or a deployment platform by itself.1installs22Evaluating Code ModelsEvaluating-code-models is a research-oriented agent skill that walks solo builders through the BigCode Evaluation Harness benchmark catalog for code LLMs. It explains how benchmarks like HumanEval and extended suites use unit tests to score functional correctness with pass@k, and how to launch runs with accelerate, temperature, sample counts, and code execution enabled. The skill fits indie developers who need reproducible evidence when choosing a model for Claude Code–style agents, codegen tools, or internal CLI assistants. Use it when you are shortlisting models, validating a fine-tune, or sanity-checking claims before wiring a model into build and ship pipelines. It does not replace your own app-level tests, but it standardizes apples-to-apples comparisons across models on shared programming tasks.1installs23Evaluating Cosmos PolicyEvaluating-cosmos-policy is an agent skill that documents how to run Cosmos Policy evaluations on the LIBERO benchmark using NVIDIA’s public run_libero_eval module. It targets solo researchers and indie builders validating vision-language-action policies before they promote a checkpoint to demos or papers. The readme centers on acquiring a GPU—often via an interactive Slurm srun shell—setting CUDA and MuJoCo EGL variables for headless simulation, and invoking uv with the libero dependency group and pinned Python. You can start with a smoke evaluation (few trials, a single task suite) and scale trial counts and suites as confidence grows. Parameters cover checkpoint paths, dataset statistics, T5 embeddings, image augmentation flags, and open-loop control horizons. Use it when you already cloned cosmos-policy and need reproducible eval commands rather than ad-hoc notebook loops.1installs24Evaluating Llms HarnessEvaluating LLMs Harness teaches solo builders how to run standardized benchmarks against API language models using the lm-evaluation-harness ecosystem. Instead of subjective chat comparisons, you wire providers through documented model types—openai-completions, openai-chat-completions, anthropic-chat, and local-compatible endpoints—and choose tasks that match each API’s capabilities. The skill stresses limitations: chat endpoints often lack logprobs, which blocks perplexity and some loglikelihood evaluations. Use it when validating a model swap, comparing GPT and Claude families on the same task list, or tracking vendor updates over time. Intermediate complexity assumes comfort with CLI env vars, batch sizing, and reading harness task names. Outputs are reproducible benchmark commands you can rerun in CI or before a release.1installs25Evolving Ai Agentsevolving-ai-agents is a reference skill for Orchestra’s A-Evolve stack: import `agent_evolve as ae`, construct an `Evolver` with an agent seed or custom workspace, attach a named or custom benchmark, and iterate evolution cycles until benchmarks improve. Solo builders shipping autonomous coding agents use it when ad-hoc prompt tweaks stop scaling and they need a structured loop—tasks, scoring, and evolvable layers—grounded in public harnesses like SWE-verified or MCP-Atlas. The doc spells resolution rules for string agent names, working-directory copies, and manifest validation so you do not start evolution on a broken workspace. You can override `workspace_dir`, inject a custom engine, and thread `EvolveConfig` without re-reading the whole Python package. Complexity is advanced because you are orchestrating benchmarks, seeds, and evolution state—not invoking a single API call. Treat this as Build-phase agent infrastructure; pair it with your own eval harness and version control before Ship.1installs26Faissfaiss is an agent skill that walks solo builders through Facebook AI Similarity Search index families so they stop guessing between exact Flat search, IVF clustering, HNSW graphs, and compressed IVF+PQ. It targets developers shipping semantic search, RAG chunk retrieval, or embedding-based recommendations who need copy-paste Python for IndexFlatL2, cosine-ready IndexFlatIP, and IVF construction patterns. Use it while you are still designing the retrieval layer—not after you have locked a database—because the wrong index wastes GPU RAM, misses recall targets, or makes billion-vector corpora impossible on a single machine. The guide emphasizes when to train IVF quantizers, when Flat is the honest baseline, and how normalization ties inner-product indices to cosine geometry. Intermediate familiarity with NumPy float32 vectors is assumed; you get operational recipes rather than a full FAISS theory course.1installs27Fine Tuning Openvla OftFine-tuning OpenVLA-OFT is an agent skill for solo builders and small research teams who need a repeatable ALOHA workflow: separate server and client conda environments, preprocess demonstration data, fold variants into RLDS, and register entries across the OpenVLA dataset plumbing before training. It assumes you are comfortable with PyTorch installs, editable package installs, and editing prismatic VLA dataset modules rather than clicking through a hosted fine-tune UI. Use it when you already have raw task demonstrations and want OpenVLA-OFT+ on real hardware with HTTP action serving instead of ad-hoc notebooks. The guide is advanced and hardware-specific; it does not replace general MLOps or safety review for physical robots. Outcomes are configured environments, preprocessed splits, and a documented path to evaluation—not a one-command SaaS deploy.1installs28Fine Tuning Serving Openpifine-tuning-serving-openpi is an agent skill that catalogs OpenPI checkpoint and serving mappings so robotics builders stop guessing which config name pairs with which gs://openpi-assets path. It centers on scripts/serve_policy.py: default modes per environment (ALOHA, ALOHA_SIM, DROID, LIBERO), explicit checkpoint overrides for pi05_droid, pi0_fast_droid, pi05_libero, and a local directory template after fine-tuning. Solo builders experimenting with vision-language-action policies get copy-paste uv commands, notes on asset caching under ~/.cache/openpi, and when to switch from default environment serving to explicit checkpoint mode. The skill fits advanced ML robotics workflows—not generic LLM chat apps. Expect uv, OpenPI scripts, and cloud bucket checkpoints as the happy path; prefetch and LIBERO-specific setup may continue in the full upstream readme beyond this excerpt.1installs29Fine Tuning With Trlfine-tuning-with-trl is an agent skill that walks solo builders and small teams through Direct Preference Optimization in Hugging Face TRL. It explains how DPO trains on chosen versus rejected completions and catalogs more than ten loss_type options so you can pick sigmoid for a default run, IPO when you want a different optimization framing, hinge for margin objectives, or robust DPO when labels are noisy. Each section pairs the loss with practical DPOConfig snippets—batch sizes, learning rates, beta, and prompt or max length where relevant—so your coding agent can scaffold a training script without re-deriving RLHF trivia. Use it while building agents, APIs, or SaaS features that need a fine-tuned model rather than prompt-only behavior. It assumes you already have preference data and a TRL-compatible stack; it does not replace dataset curation, evaluation harnesses, or production serving.1installs30Gguf QuantizationGguf-quantization is an agent skill from the AI research skills collection that teaches advanced GGUF and llama.cpp usage after you have already chosen a quantized weight file. Solo builders running local LLMs for coding agents use it to squeeze latency and cost through speculative decoding, batched prompts, and server-side continuous batching rather than one-off CLI calls. The readme spans shell workflows (`llama-speculative`, `llama-server`, `llama-cli` with lookup caches) and Python `llama_cpp` examples with GPU offload and batch sizing. It also points toward custom model conversion paths when vocabulary or architecture needs tweaks before quantization. Install when your bottleneck is inference ergonomics—parallel users, draft-model speedups, or operational flags—not when you still need a primer on picking the first quantization format. Pair it with hardware-appropriate Q4_K_M (or similar) artifacts you already downloaded or built.1installs31GptqThe gptq skill is a calibration playbook for post-training quantization: how many tokens to feed the calibrator, which open datasets match your model’s domain, and what happens when you under- or over-sample. Solo builders shipping self-hosted chat or code assistants need this when GPU RAM and latency matter more than full FP16 weights. It walks through Hugging Face dataset snippets for general (C4), code (The Stack with language filters), and conversational (ShareGPT/Alpaca-style) setups, each producing 512-token chunks from a streaming split. The guide ties dataset choice directly to measurable perplexity drift so you know when quantization is production-safe versus when the model will spew nonsense. It is research-grade procedural knowledge—not a one-click deploy—best paired with your existing GPTQ tooling and tokenizer. Use it during build when you are locking inference costs before ship, not when you are only calling hosted APIs.1installs32Grpo Rl Traininggrpo-rl-training is a procedural library of Group Relative Policy Optimization (GRPO) reward functions for solo builders and small teams fine-tuning language models with verifiable or structured outputs. Instead of inventing reward logic from scratch, you adapt pre-defined correctness rewards (exact and fuzzy match), format penalties, length controls, and style signals that mirror battle-tested training setups. The skill fits when you already have a GRPO trainer and need consistent, weighted objectives for math, Q&A, summarization, or formatted agent responses. It matters because mis-specified rewards silently waste GPU time and produce models that look fluent but fail grading or schema checks. Treat it as reference code to wire into your training script—not a full training orchestration skill.1installs33GuidanceGuidance is an agent skill that acts as a backend configuration guide for the Guidance library—structured generation for LLMs where you interleave prompts, control flow, and token constraints in Python. Solo builders adopting agentic features often bounce between ad hoc OpenAI calls and half-working local models; this skill centralizes how to instantiate Anthropic and OpenAI clients, which Claude model strings to pass, and how to tune timeout and retry behavior for reliable pipelines. It also points to local inference paths and advanced configuration so you can prototype cheaply on laptop GPUs before promoting the same templates to hosted APIs. The readme is reference documentation rather than a single end-to-end app generator—invoke it when you are wiring lm objects, comparing backends, or optimizing latency for template-heavy agents. It supports builders who want repeatable JSON-or-schema-shaped outputs without rewriting parsers on every model swap.1installs34Hqq Quantizationhqq-quantization is an advanced agent skill for solo builders and small teams who ship local or self-hosted LLM features and need finer control than a one-line quantize script. It walks through selecting HQQ backends based on GPU generation, assigning different kernels to attention versus MLP blocks, and wiring TorchAO int4 options when throughput matters more than simplicity. The audience is developers already comfortable with PyTorch who are in the build phase tightening inference cost before launch. Use it when default quantization leaves you with the wrong backend for your card, when layers need heterogeneous precision, or when you are benchmarking marlin versus aten on real module names. It does not replace fundamental model training docs; it encodes operational recipes your agent can apply while editing quantize pipelines. Confidence is high for backend configuration content; the readme excerpt is partial, so mixed-precision bullets follow the guide’s stated scope.1installs35Huggingface AccelerateHugging Face Accelerate is an agent skill for solo builders shipping fine-tunes, small foundation-model experiments, or agent backends that need sane distributed training without rewriting PyTorch boilerplate every sprint. The material focuses on Accelerate’s plugin architecture: dataclass-style custom plugins, post-init validation, and passing specialized kwargs handlers—GradScalerKwargs for FP16 scaling is spelled out with init scale, growth factor, backoff, and growth interval fields. You use it when a single GPU laptop workflow stops scaling and you must enable mixed precision or multi-process training while keeping one Accelerator entry point. The readme frames custom plugins as extensions beyond stock DDP, FSDP, and DeepSpeed paths, which helps indie teams document why they chose a given handler set before Ship perf and cost reviews. Intermediate to advanced: expect comfort with Python training loops and Hugging Face ecosystem docs.1installs36Huggingface TokenizersHuggingface-tokenizers is a reference-oriented agent skill that teaches how modern subword tokenizers actually work—BPE merges by frequency, WordPiece likelihood splits, and Unigram sampling—using concrete corpus examples instead of black-box API calls. Solo builders shipping LLM-powered features need this when fine-tuning fails mysteriously, special tokens explode sequence length, or a custom domain vocabulary mis-segments product terms. The skill lives primarily on the Build shelf as documentation agents pull into implementation chats, but the same reasoning applies during Validate prototypes when you judge whether a base model tokenizer fits your domain. It pairs naturally with Hugging Face training stacks and custom dataset work. Complexity is intermediate: you should already be writing Python around transformers pipelines. It does not install libraries or run training; it equips your agent to reason about merge rules and vocabulary design before you commit to a tokenizer.json.1installs37Implementing Llms LitgptImplementing LLMs with LitGPT is an agent skill for solo builders and researchers who need clean, single-file Python control over transformer internals instead of opaque Hugging Face monoliths. It explains how LitGPT organizes GPT, Block, CausalSelfAttention, MLP, and normalization layers, and how Config dataclasses (including Llama-, Mistral-, and Phi-style examples) parameterize depth, heads, embeddings, RoPE, and custom flags like alternate attention mechanisms. The intended workflow starts with defining MyModelConfig, implementing or subclassing architectural pieces, then wiring the model into LitGPT’s training path for experimentation on new research ideas, domain-specific adapters, or ablation studies. Use it when you already chose LitGPT as the training stack and need the agent to generate consistent, framework-native code. It is advanced material: you should understand transformer blocks and Python ML tooling. It does not replace data prep pipelines, hyperparameter sweeps, or production serving—those are separate Build and Operate concerns.1installs38InstructorInstructor is an agent skill packaging real-world recipes for the Instructor library so solo builders can return validated Pydantic objects from Claude instead of fragile JSON parsing. The snippets cover data extraction (company facts from prose), classification (sentiment with confidence scores), multi-entity graphs (people, orgs, locations), and richer structured analysis blocks—each using client.messages.create with an explicit response_model. For indie SaaS and agent backends, that means pipelines you can unit test, schema-evolve, and feed into downstream automation without regex repair loops. Use it during Build when you are integrating LLM calls into FastAPI jobs, enrichment workers, or agent tools that must honor types and enums in production. It assumes you already run Python with Pydantic models; it accelerates copy-paste-correct patterns rather than replacing your API keys, rate limits, or observability layer.1installs39Knowledge DistillationKnowledge Distillation is an agent skill centered on MiniLLM’s reverse KL divergence approach for compressing large language models into smaller students that still generate usefully diverse text. Solo builders hitting inference cost or latency walls—running agents on laptop-grade GPUs, shipping on-device assistants, or cutting API bills—often reach for naive KL distillation and get flat, overly safe outputs; this skill explains that failure mode in plain language and points to reverse KLD as the mode-covering alternative from the 2023 MiniLLM line of work. It is research-anchored but written for practitioners who need the loss intuition before wiring a training job. Primary shelf is build/backend ML work, yet the same reasoning applies when you validate whether a smaller model can replace a teacher in production and when you operate tuned serving stacks. Pair it with Unsloth or your trainer of choice once you pick student architecture and data; this skill does not replace a full training framework—it clarifies which distillation objective to implement.1installs40Lambda Labs Gpu CloudLambda Labs GPU Cloud is a procedural guide for solo and indie builders who need to run serious PyTorch training on rented GPU fleets instead of a single local card. It walks through initializing torch.distributed with the NCCL backend, binding each process to a local CUDA device, wrapping models in DistributedDataParallel, and saving checkpoints only from rank zero so parallel writers do not clash. The launch section shows matching torchrun flags across a master node and worker nodes using private IPs and a fixed master port—exactly the glue people miss when moving from one 8×GPU box to a small multi-node cluster. Use it when you have chosen Lambda Labs for cost or availability and need a copy-paste baseline before tuning batch size, gradient accumulation, or fault tolerance. It does not replace cluster networking design or orchestration platforms; it gives you the minimal correct distributed loop so your agent or scripts can extend dataloaders, metrics, and scheduling.1installs41LangchainLangChain Agents Guide is a procedural reference for solo builders who need to ship language-model agents that actually call code, not just chat. It explains the ReAct loop—reason about the task, act via tools, observe results, and continue until the goal is met—and walks through minimal Python using LangChain’s create_agent with Anthropic or OpenAI chat models. You get concrete patterns for defining tool functions with docstrings, attaching them to an agent, and running invoke with a standard messages payload. The skill fits indie developers wiring assistants into backends, internal ops bots, or SaaS copilots where deterministic tool boundaries matter. It is not a full production hardening guide (auth, rate limits, evals) but a fast on-ramp to composable agents aligned with how Claude Code and similar agents expect structured capabilities. Use it when you have APIs or scripts to expose and need a repeatable agent skeleton instead of one-off prompt spaghetti.1installs42Langsmith ObservabilityLangSmith Observability is an agent skill for solo builders who ship LLM-powered products and need serious evaluation—not just request logs. It documents advanced LangSmith usage: attaching custom Python evaluators to runs, orchestrating batch evaluate() jobs against named datasets, and standing up LLM-as-judge scoring when human labels are too slow for indie pace. You reach for it while wiring agent backends or research prototypes where wrong answers have user-visible cost, and again when you harden releases with regression datasets. The snippets assume you already run traces in LangSmith and want repeatable scoring keys, comments, and normalized grades instead of one-off notebook checks. It complements generic testing skills by focusing on production-like run objects, example inputs/outputs, and judge prompts that scale with your agent iterations.1installs43Llama Cppllama-cpp is a performance optimization skill for solo builders running open models through llama.cpp on their own machines. It consolidates practical knobs—thread count, BLAS builds, partial or full GPU offload, batch sizes, and context windows—into actionable command-line patterns so you stop guessing why inference feels slow. The guide explains how to find a stable -ngl value when VRAM is tight, when to shrink context, and how to watch GPU memory during tuning. It includes representative throughput numbers across Apple Silicon, AMD, Intel CPUs, and RTX-class GPU offload so you can sanity-check your setup. Use it when you are shipping a local agent backend, a CLI assistant, or an offline research stack and need predictable latency without jumping to hosted APIs.1installs44Llama Factoryllama-factory is an agent skill that surfaces LLaMA-Factory documentation for solo builders and small teams who need a practical map from install to fine-tuned chat models. The ingested readme spans advanced topics: GPT-OSS three-step LoRA fine-tuning (dependencies, training, weight merge), full fine-tuning scripts, trainer taxonomies from supervised fine-tuning through RLHF and preference optimization (DPO, KTO), plus Ascend NPU inference with vLLM-Ascend where Python 3.10–3.11 and CANN toolchains apply. It also points to the Web UI for loading models in chat mode with vLLM for faster inference versus vanilla Hugging Face paths. Expect advanced GPU or NPU planning—single-GPU GPT-OSS notes call for more than 44 GB VRAM unless you scale out. Use this skill when your agent should walk you through LLaMA-Factory commands, config choices, and hardware constraints while you build custom LLM-backed features, not when you only need a hosted API key. Prism catalogs it as documentation-forward ML tooling rather than a one-click hosted trainer.1installs45LlamaguardLlamaGuard is an agent skill package that teaches you how to run Meta’s specialized moderation model to classify chat for policy violations before and after your main LLM generates text. It targets builders shipping agents, copilots, or APIs who need a dedicated safety layer instead of hoping the base model self-censors. The skill documents six hazard categories—from violence and hate through criminal planning—and walks through HuggingFace transformers setup, optional vLLM serving, and enterprise paths such as SageMaker, plus integration notes with NeMo Guardrails. You get concrete Python for applying the chat template, generating a short safety verdict, and interpreting outputs like unsafe with a category code. Use it when compliance, brand risk, or platform rules require systematic filtering at scale, and you want a reproducible deployment recipe rather than ad-hoc keyword blocklists.1installs46LlamaindexLlamaIndex is a Prism-tagged agent skill that walks solo builders through constructing LlamaIndex agents with tools and retrieval. It starts with a minimal FunctionAgent wired to an OpenAI model and a Python callable, then layers QueryEngineTool so the agent can answer from embedded documents instead of hallucinating APIs. The multi-document section shows how to expose separate indexes—each with name and description—so the model picks the right corpus at runtime. This fits indie developers shipping a support bot, internal research assistant, or feature-specific copilot without rebuilding RAG plumbing from scratch. Invoke it when your repo already uses or plans LlamaIndex and you need consistent patterns for tool lists, index construction, and chat-based orchestration rather than one-off notebook snippets.1installs47LlavaLLaVA is an agent skill that condenses the official-style training guide for fine-tuning LLaVA vision-language models into steps a solo builder can hand to Claude Code, Cursor, or Codex on a GPU machine. It explains stage one feature alignment—binding a CLIP ViT-L/14 encoder to a Vicuna-7B or LLaMA-2-7B backbone on roughly 558K image-caption pairs—and stage two visual instruction tuning on about 150K multimodal instruction examples with concrete hyperparameters such as learning rate 2e-5, batch 128 across eight GPUs, and single-epoch fine-tuning. The skill includes the expected JSON conversation schema with an image placeholder token and alternating human and assistant turns, plus bash script names for pretrain and finetune flows. Builders use it when prototyping multimodal agents, custom visual QA, or domain-specific VLMs rather than calling a hosted API only. It assumes you can provision multi-GPU time and dataset paths; it does not manage cloud billing or automated eval harnesses unless you add those separately.1installs48Long ContextLong-context is a research-oriented agent skill that distills how modern transformer stacks stretch beyond native training lengths without blindly stretching RoPE. Solo and indie builders shipping Claude Code, Cursor, or Codex workflows with large repos, transcripts, or tool traces need a decision framework—not hype—before they fine-tune, swap models, or redesign chunking. The skill walks through YaRN’s frequency-aware RoPE extension (including attention temperature scaling), ALiBi’s linear attention biases, and classic position interpolation, then contrasts when each method preserves high-frequency detail versus how much extra training data you pay for. It is written for builders who own retrieval, summarization, or agent memory design and must justify a 32k versus 128k posture to themselves or a small team. Use it during discovery and again when implementation choices in the build phase affect tokenizer limits, GPU budget, and eval harnesses. It does not run training jobs or patch frameworks; it equips you to read vendor docs and research claims critically.1installs49Mamba ArchitectureMamba Architecture is a research-oriented agent skill that teaches how Mamba’s selective state space model (S6) works compared with classical fixed SSMs and transformer attention. Solo builders and indie ML practitioners use it when deciding whether linear-time sequence modeling fits their agent, speech, or long-context use case, or when an coding agent must implement or debug mamba_ssm configurations without misunderstanding discretization and input-dependent gates. The material walks through state updates, selection on important tokens, and complexity tradeoffs, then surfaces practical hyperparameters such as hidden size, state dimension, and local convolution width. It is not a training pipeline or production monitoring skill; it supplies conceptual and API-level grounding so later Build work on custom models, fine-tuning stacks, or research prototypes starts from accurate mechanics. Pair it with experimentation notebooks or implementation tasks once you have a hypothesis to test.1installs50Miles Rl Trainingmiles-rl-training is a reference-oriented agent skill that teaches how to launch and tune miles, an enterprise reinforcement-learning framework built on slime for large mixture-of-experts models. It is aimed at builders and small research teams who already have GPU clusters and Hugging Face checkpoints and need a structured map of advantage estimators, actor/rollout resource flags, tensor and pipeline parallelism, and MoE expert parallelism. The readme emphasizes practical command-line patterns—GRPO with qwen-scale models, colocated actor/rollout layouts, and SGLang speculative algorithms—rather than indie SaaS scaffolding. Use it when you are in the Build phase wiring training scripts, not when you are validating a landing page or operating production support queues. Pair it with slime’s base API reference because miles explicitly inherits slime’s configuration system and Sample dataclass semantics including `rollout_routed_experts` for routing replay.1installs51MlflowMlflow is an agent skill built around Orchestra Research’s deployment guide for MLflow models. Solo builders who already log or register models in MLflow use it when they need to move from a notebook or experiment to something callable in production—whether that is a dev laptop on port 5001, a container, a managed cloud endpoint, or batch scoring jobs. The skill walks through deployment option tradeoffs, concrete CLI commands for serving registered models and run artifacts, HTTP invocation patterns, and higher-complexity paths like Kubernetes and cloud ML services. It also points toward monitoring and production patterns so indie teams do not treat deployment as a one-off script. Install it when you have a model name or run ID and need agent-guided steps that match MLflow’s native serve, Docker, and cloud integration vocabulary instead of guessing flags or endpoint shapes.1installs52Ml Training RecipesML Training Recipes is a reference skill that packages modern transformer implementation patterns for solo builders training or customizing language models. It walks through RMSNorm, rotary position embeddings, grouped-query attention, sliding-window flash attention, value embeddings, activation choices, residual scaling, logit soft capping, assembled transformer blocks, and configuration conventions—each with concise Python-oriented guidance meant to be copied into a real training repo. Use it when you are past the idea stage and actively coding a model stack rather than shopping hosted APIs. The content assumes comfort with PyTorch-style modules and transformer training loops. It does not replace experiment design or dataset curation; it accelerates correct, contemporary architecture wiring so you spend fewer cycles debugging norm placement or attention variants.1installs53Modal Serverless GpuModal Serverless GPU is a research-oriented skill package for solo builders and small teams who need serious training capacity without owning hardware. It documents how to define Modal apps, slim Debian images with torch, transformers, accelerate, and deepspeed, and attach explicit GPU shapes from single H100 pods to eight-way A100 fleets. The workflows cover Accelerate-based multi-GPU loops, DeepSpeed-backed Trainer runs with fp16 and gradient accumulation, and the subtle Multi-GPU footguns when frameworks re-execute the Python entrypoint—subprocess and ddp_spawn guidance is included for that. Use it while building ML backends, fine-tuning agents, or research prototypes that must scale out temporarily then disappear from your bill. It complements generic DevOps skills by focusing on Modal’s serverless contract rather than raw Kubernetes. Expect intermediate Python and PyTorch familiarity; outputs are runnable function stubs you adapt to your dataset and checkpoint strategy.1installs54Model Mergingmodel-merging is a Data Science & ML research skill for indie builders experimenting with combined LLM checkpoints who need disciplined evaluation before calling a merge successful. Prism shelves it under ship/testing because the content is overwhelmingly about benchmark suites, metrics, and comparison—not about fusing weights in training jobs. It walks through the Hugging Face Open LLM Leaderboard task set (six benchmarks spanning reasoning, knowledge, truthfulness, and math), shows how to run lm_eval against a local merged model path, and points to MT-Bench for multi-turn quality via FastChat. Solo operators shipping custom agents or fine-tuned stacks can use the same playbook during build when sanity-checking a merge candidate and again in operate when regression-testing a new artifact against a baseline. Expect Python, GPU-ish batch sizes, and network access to model hubs. It does not replace merge algorithms or legal review of model licenses. Treat it as a checklist-driven evaluation guide so your agent does not hand-wave "the merge looks fine" without numbers.1installs55Model Pruningmodel-pruning documents the Wanda method from the ICLR 2024 paper for solo builders and small teams who need smaller, faster LLMs without launching a full retraining program. The skill explains why pruning by weight magnitude alone mis-ranks parameters and how multiplying by input activation norms surfaces rarely used large weights for removal. It outlines one-shot pruning with calibration activations on Hugging Face causal LMs—useful when GPU memory, token cost, or edge deployment limits bite after you have a working agent. Pair this with measured evals on your own prompts; the guide gives the algorithmic framing and PyTorch-oriented steps, not a managed hosting product.1installs56Moe TrainingMoE Training is a research-oriented agent skill from Orchestra Research that documents major Mixture of Experts LLM architectures rather than a hands-on training notebook. It walks through Mixtral 8x7B sparse blocks, DeepSeek-V3, Google Switch Transformer and GLaM lineages, and consolidates comparison-friendly facts such as total versus active parameters, expert counts, and routing strategies. Solo builders evaluating custom agents, fine-tuning vendors, or infra costs can use it to reason about why top-k routing changes latency and memory, and when sparse FFN layers beat dense stacks. The content skews advanced: expect PyTorch-style module sketches and architecture vocabulary, not a full MLOps pipeline. Pair it with your own experiment design and serving benchmarks before production bets.1installs57NanogptNanoGPT is a stripped-down GPT-2 implementation designed to teach transformer architecture in readable, minimal code. Solo builders use it to understand how self-attention, embeddings, and token prediction work without navigating heavyweight frameworks. It matters because learning from clean implementations builds intuition for building and fine-tuning language models, making you more effective when scaling to production LLMs.1installs58Nemo CuratorNemo-curator documents how to run NVIDIA NeMo Curator deduplication modules on text datasets: exact hashes, fuzzy MinHash LSH, and embedding-based semantic matching. It is aimed at indie builders and small teams preparing training or retrieval corpora who need reproducible Python snippets and parameter guidance instead of one-off scripts. Use it when ingestion pulls overlapping crawls, forum dumps, or scraped docs that would bloat token budgets and skew evaluations. The guide contrasts the three strategies so you can trade accuracy for cost—exact for cheap wins, fuzzy for near copies, semantic for paraphrases. GPU execution is emphasized where available because the readme cites large wall-clock reductions on heavy jobs. It does not replace legal licensing review or PII scrubbing; pair it with your own quality filters. Expect intermediate Python comfort, a defined id_field and text_field schema, and optional CUDA for semantic batches at scale.1installs59Nemo Evaluator SdkThe nemo-evaluator-sdk skill explains NeMo Evaluator’s adapter and interceptor system—the layer that sits between the evaluation engine and your model HTTP APIs. Solo builders shipping agent or API-backed models use it when benchmark suites need preprocessing, auth headers, response normalization, or logging without forking the core evaluator. The architecture is explicit: requests flow through a chain of interceptors, then an endpoint interceptor performs the Model API call, then responses unwind through interceptors in reverse. That ordering contract is what keeps eval configs portable across environments. The skill also points to how adapters are specified in configuration so your coding agent can extend pipelines incrementally rather than embedding one-off fetch logic in every task script. It pairs naturally with broader ML eval workflows in the same research-skills repo, but stands alone as integration knowledge for anyone standardizing NVIDIA NeMo-style evaluation runs.1installs60Nemo GuardrailsNeMo Guardrails is NVIDIA’s open-source runtime safety framework for LLM applications. Solo builders shipping chatbots, copilots, or API-backed agents install it when policy compliance and abuse resistance must be enforced outside ad-hoc system prompts. The skill walks through RailsConfig, Colang define/flow patterns, and wrapping generate() so blocked topics return fixed refusals instead of model improvisation. Workflows cover jailbreak detection, output validation, and extending rails for domain-specific rules. It fits indie teams who already have a Python stack and need guardrails that run beside the model rather than only in prompt engineering. Expect intermediate-to-advanced setup: you configure flows, test adversarial inputs, and tune rails without replacing your entire inference stack. Prism lists it for builders in ship and operate who treat agent safety as infrastructure, not a one-off review.1installs61Nnsight Remote Interpretabilitynnsight Remote Interpretability is a reference-oriented agent skill that teaches coding agents how to use the nnsight LanguageModel API for causal interventions and inspection of internal activations. Solo builders exploring mechanistic interpretability, custom eval hooks, or research prototypes can load standard HuggingFace checkpoints, enter a trace context, capture layer outputs (for example transformer block 5 hidden states), and optionally offload execution remotely without changing the mental model. The readme is API-dense rather than a product workflow: it foregrounds loading patterns, tracer parameters, and the save/deferred-execution model that distinguishes nnsight from one-shot forward passes. Use it when you are building tooling around LLMs—not when you only need inference endpoints. Expect intermediate Python and PyTorch fluency; remote=True assumes you have arranged NDIF or compatible remote infrastructure as your project requires.1installs62Openrlhf TrainingOpenRLHF-training is a reference skill for solo builders and small ML teams shipping custom alignment pipelines with OpenRLHF. It walks through six RL algorithms exposed via --advantage_estimator—GAE-backed PPO, REINFORCE++ variants, GRPO group normalization, Dr. GRPO, and RLOO—so you can match estimator choice to reward complexity, GPU memory, and training stability instead of defaulting to PPO. Each section spells out the loss intuition, whether a critic is required, and practical when-to-use guidance, plus starter hyperparameters such as clip_eps_low/high, actor and critic learning rates, and init_kl_coef. Use it while designing RLHF experiments, debugging unstable policy updates, or documenting why you picked GRPO over full PPO for a given reward model. It assumes you are already in the OpenRLHF stack rather than teaching greenfield ML from scratch.1installs63Optimizing Attention FlashOptimizing Attention Flash is a research-backed agent skill that surfaces Flash Attention performance benchmarks so solo and indie builders can reason about long-context cost before they commit to hardware and kernels. The material compares standard attention against Flash Attention 2 and 3 across Ampere and Hopper GPUs, with millisecond forward-pass timings at common sequence lengths and batch/head configurations. Use it when you are prototyping or productionizing transformer workloads and need defensible speedup and memory expectations instead of guessing from blog posts. It matters for AEO-facing ML products, agent context windows, and inference budgets where one wrong attention path can blow GPU spend. The skill is reference-oriented: tables, scaling notes, and version comparisons rather than a full install wizard—pair it with your actual training stack docs when you implement.1installs64OutlinesOutlines is an agent skill from the AI research skills collection that helps solo builders pick and configure structured-generation backends without trial-and-error across incompatible inference stacks. It explains how to load Hugging Face Transformers models, steer devices from CPU through CUDA and MPS, and tune memory with float16 or 8-bit quantization before you attach JSON or schema-guided generators. The same guide branches to local high-throughput paths (vLLM, llama.cpp) and hosted OpenAI APIs, with notes aimed at production deployment rather than notebook demos. For indie agent products, that means fewer silent schema violations and a clearer place to document which backend your coding agent should invoke per environment. Use it while standing up agent-tooling, then revisit when you harden inference for ship and operate workloads.1installs65Peft Fine TuningPEFT Fine-Tuning is an agent skill that walks solo builders through advanced Hugging Face PEFT recipes beyond basic LoRA: DoRA for magnitude–direction decomposition, AdaLoRA for importance-based rank allocation, and LoRA+ for asymmetric optimizer settings. It targets indie developers shipping smaller GPUs who still need instruction-tuned or domain adapters without full finetune cost. Use it during build when you already have a base model and task_type CAUSAL_LM (or similar) and need copy-paste configs with tradeoff notes—memory, rank, and target_modules choices. The skill matters because picking the wrong variant wastes training cycles or silently underfits; the guide ties each variant to concrete when-to-use bullets so your coding agent does not guess hyperparameters from generic chat advice.1installs66Phoenix ObservabilityPhoenix observability is an agent skill package for solo builders shipping LLM-powered products who need repeatable evaluation beyond vibe checks. It documents how to wire Arize Phoenix evals with OpenAIModel and llm_classify, including custom prompt templates that judge accuracy, completeness, and clarity against optional reference answers. You define rails, parse labels and explanations, and extend the same pattern into multi-criteria loops when one response must satisfy several rubrics. The guide is advanced Python-oriented procedural knowledge—not a hosted MCP server—so it fits Claude Code, Cursor, or Codex sessions where you already trace runs in Phoenix. Use it while building agent tooling, before promoting prompt changes, and when you want monitoring-friendly eval hooks that align with production observability workflows.1installs67PineconePinecone is an agent skill that teaches solo builders how to adopt Pinecone’s managed vector database for production retrieval workloads. It targets builders shipping RAG assistants, recommendation features, or semantic search who do not want to operate vector infra themselves. The skill covers installation via the official Python client, index creation with dimensions that match your embedding model, and operational expectations like hybrid dense/sparse search, metadata filters, and namespaces for tenancy. It foregrounds production metrics—sub-100ms p95 latency and a managed SLA—while honestly pointing to alternatives when you need fully self-hosted or offline similarity search. Use it during integration work once you have embeddings and need a scalable query path agents can call. It is integration documentation packaged as procedural knowledge for Claude Code, Cursor, or Codex rather than a standing review or planning ritual.1installs68Presenting Conference TalksPresenting Conference Talks is a template skill for building professional conference presentations in two formats: Beamer LaTeX for pixel-stable PDFs and python-pptx for slides you can edit in PowerPoint. Solo builders, indie hackers, and researchers use it when they need a credible oral-talk layout—title slide, optional table of contents, figure paths, and per-slide speaker notes—without designing deck architecture from scratch. The Beamer template targets 16:9 at 12pt with Metropolis styling, appendix numbering, and commented options to show or hide notes on a second screen. Color blocks are parameterized so you can align with venue or lab branding quickly. The skill fits anyone translating a paper or product narrative into a timed talk; it is not a substitute for rehearsal or content strategy, but it removes blank-deck friction. Pair it with your own figures directory and metadata (title, authors, institute) before generating frames for problem, approach, evaluation, and conclusion sections typical of research oral presentations.1installs69Prompt GuardPrompt-guard packages Meta’s Prompt Guard 86M classifier for solo builders shipping LLM-powered SaaS, agents, and RAG APIs. You install transformers and torch, load meta-llama/Prompt-Guard-86M, and return a jailbreak probability score you threshold on user input and on retrieved third-party text before it enters the context window. The skill fits the Ship security lane but also applies while you wire integrations in Build and when you operate production chat or support bots in Operate. It is an integration reference, not a hosted firewall— you own deployment, thresholds, logging, and fallbacks when scores edge near your cutoff. Use it when compliance-minded input validation is missing from your stack; pair with broader appsec review because classifier gates do not replace authorization, output filtering, or secrets hygiene.1installs70Pytorch Fsdp2The pytorch-fsdp2 skill distills official PyTorch guidance on Fully Sharded Data Parallel (FSDP2) workflows alongside Distributed Checkpoint (DCP), with emphasis on asynchronous saving. Indie builders training larger models on modest multi-GPU setups feel checkpoint pauses as wasted wall-clock; the skill explains how async_save moves persistence off the hot training path and what tradeoff that introduces—extra CPU memory because weights are copied to buffers first. It also frames DCP as the parallel, multi-file checkpoint format that supports resharding when you save on one topology and resume on another, plus the in-place load model where storage is allocated before state is poured in. Agents using this skill should treat the compatibility caveat seriously: checkpoint formats may not survive arbitrary PyTorch upgrades. Pinned-memory tactics from the tutorial recipe are flagged for builders who need every millisecond back. Together, the content steers implementation choices during training code authorship rather than abstract DevOps panels.1installs71Pytorch LightningPytorch-lightning is a catalog skill from orchestra-research’s AI research bundle aimed at solo builders and indie researchers who want agents to follow PyTorch Lightning conventions when writing training code. Lightning wraps PyTorch with standardized Trainer flows, checkpointing, and distributed hooks so one-person teams can ship reproducible experiments faster than raw PyTorch boilerplate. Prism lists minimal SKILL.md body here, so treat the skill as a routing hint and procedural anchor for agent conversations about modules, callbacks, and training configuration rather than a full tutorial. Use it while building model training backends, fine-tuning jobs, or research prototypes where you need consistent project layout and fewer ad-hoc training scripts. Cross-check API details against the Lightning version in your environment because upstream metadata alone does not enumerate every rule or checklist.1installs72Pyvene InterventionsPyvene Interventions is a concise API reference skill for the pyvene library, which layers intervention configs on top of PyTorch transformers. It shows how to construct an IntervenableConfig with RepresentationConfig entries (for example block_output at a chosen layer), instantiate IntervenableModel around a pretrained causal LM, and run paired base/source forwards. Position-specific control uses unit_locations; researchers can request original versus intervened outputs side by side. Generation paths document how to sample or greedy-decode under the same intervention graph. Persistence covers local directories and HuggingFace hub workflows so experiments are reusable across runs and teammates. Advanced solo builders and small research teams use it when Claude Code or similar agents must implement interpretability probes correctly instead of guessing tensor shapes and call signatures.1installs73Qdrant Vector SearchQdrant Vector Search is an agent skill that walks solo builders through advanced Qdrant usage: distributed clusters, sharding, and client-side collection configuration for semantic search and RAG backends. It appears when you have outgrown a single Docker container and need Raft-coordinated nodes, explicit bootstrap wiring, and production-minded port and volume layout. The included compose and Python snippets show how to enable clustering, attach storage per node, and define vector parameters and sharding when creating collections. Builders shipping AI features use it while wiring retrieval into APIs and agents; operators revisit it when scaling ingestion or hardening infra. The material assumes comfort with containers, environment-based Qdrant config, and the official qdrant_client SDK—making it intermediate-to-advanced compared with a hello-world embedding demo.1installs74Quantizing Models BitsandbytesQuantizing Models Bitsandbytes is an agent skill that teaches memory optimization for large language models using the bitsandbytes and Hugging Face stack. Solo builders who want Llama-class models on a single consumer or rented GPU learn how quantization, CPU offloading, gradient checkpointing, paged optimizers, and mixed precision combine to shrink VRAM pressure without abandoning training or inference goals. The readme walks through concrete AutoModelForCausalLM.from_pretrained examples with BitsAndBytesConfig, explicit max_memory maps across GPU and CPU, and notes on automatic placement through accelerate. It frames quantization as delivering substantial footprint cuts while calling out realistic latency costs when layers move to host memory. Use this when Build-phase ML work blocks on OOM errors or when you are choosing between smaller models and aggressive memory tactics. It does not replace full training runbooks, evaluation harnesses, or production serving SLO planning.1installs75Ray DataRay-data gives solo ML builders a concise integration map between Ray Data and the trainers they already use—chiefly Ray Train with optional PyTorch and TensorFlow batch adapters. The skill walks through reading parquet from object storage, registering train and validation datasets on a TorchTrainer, and consuming per-worker shards inside train_func via ray.train.get_dataset_shard. It documents the recommended to_torch path with label columns and drop_last, plus iter_torch_batches and TensorFlow to_tf alternatives when frameworks differ. ScalingConfig examples show how to scale num_workers and GPUs without rewriting dataloader code. Use it when your agent is implementing or debugging distributed fine-tuning where data loading must stay consistent across workers, and revisit the same patterns in Operate when you refresh production training pipelines on a schedule.1installs76Ray TrainRay Train Multi-Node Setup is an agent skill for solo builders and small teams who need reproducible distributed training on a Ray cluster instead of ad-hoc single-GPU scripts. It walks through starting a head node, attaching workers, connecting with `ray.init(address='auto')`, and configuring TorchTrainer with ScalingConfig for worker count, GPUs, and placement strategy. The skill emphasizes verifying cluster capacity with `ray status` and the Ray dashboard so you do not oversubscribe nodes. Use it while you are building ML pipelines or agent research stacks that will later need horizontal scale. It pairs naturally with experiment tracking and evaluation skills once checkpoints exist. Expect shell access on every node and comfort with Python training loops wrapped for Ray Train.1installs77Rwkv ArchitectureRWKV Architecture is a research-oriented agent skill that encodes how RWKV replaces quadratic attention with linear-time WKV time-mixing paired with channel-mixing blocks. Solo builders evaluating efficient LLM backends, edge inference, or custom PyTorch modules can invoke it when they need precise vocabulary—receptance, time decay, time-mix parameters—and code-shaped explanations rather than hand-wavy summaries. It is phase-specific to early research: you are not deploying monitoring or writing marketing copy; you are deciding whether RWKV fits your latency, memory, and training constraints. The readme emphasizes the core loop over timesteps, parameter roles, and module structure so an agent can scaffold implementations or review diffs against canonical patterns. Use it alongside experimentation skills when moving from architecture understanding to prototypes, but treat outputs as technical reference that still needs benchmarking on your hardware and dataset.1installs78Segment Anything ModelSegment Anything Model is a research-oriented agent skill that documents advanced usage of Meta’s Segment Anything family: image SAM, SAM 2 for video with streaming memory and cross-frame tracking, and Grounded SAM for text-prompted masks via Grounding DINO. Solo builders shipping annotation tools, agent vision features, or ML-backed SaaS use it during build when they must choose architectures (ViT vs Hiera), install the right GitHub packages, and wire predictors into Python services. The readme emphasizes operational snippets—initializing video state, adding frame prompts, propagating masks, and contrasting SAM 2 capabilities against classic SAM—rather than training theory alone. It is phase-specific to build integrations because execution assumes a repo, Python environment, and model weights paths ready to connect to your pipeline.1installs79SentencepieceThe sentencepiece skill is procedural guidance on subword tokenization algorithms—chiefly Byte-Pair Encoding (BPE) and Unigram—as used by Google SentencePiece. Solo builders fine-tuning open models or training smaller domain LMs need this when raw character splits explode sequence length or when word-level vocabularies miss morphology. The skill walks through BPE’s iterative pair merges with a frequency table example, then contrasts Unigram’s large initial vocabulary, loss-based pruning, and highest-probability segmentation paths (e.g., lowest vs widest). It calls out BPE’s simplicity and speed against Unigram’s ability to represent multiple segmentations for regularization at train time. A minimal trainer invocation shows how to produce a model file from corpus.txt at a target vocabulary size. For Prism’s audience, this is the checkpoint before committing tokenizer assets that downstream training scripts, eval harnesses, and export pipelines will assume—reducing surprise OOV behavior and vocab mismatch between pretrain and finetune.1installs80Sentence TransformersSentence Transformers Models Guide is a reference skill for solo and indie builders who need embeddings for semantic search, RAG, or clustering without running a full benchmark sprint. It summarizes practical defaults (MiniLM for prototyping, mpnet for production RAG, roberta-large when quality dominates), multilingual options, and niche models for science, legal, and code. Use it when you are integrating Hugging Face–style encoders into an agent or API and must balance vector dimension, throughput, and retrieval quality. The matrix format makes it easy to paste the right model name into your stack and justify the choice to future you when inference bills or recall complaints show up.1installs81Serving Llms VllmServing-llms-vllm is an Orchestra Research agent skill that packages operational knowledge for hosting large language models with vLLM. It explains PagedAttention block allocation, continuous batching that keeps GPUs busy across variable sequence lengths, prefix caching for shared prompts, and speculative decoding for latency wins. Solo builders running private inference APIs—side projects, agent backends, or research sandboxes—use it when naive serve defaults OOM or under-utilize GPUs. The readme cites concrete tuning knobs such as --block-size and --gpu-memory-utilization alongside qualitative throughput multiples. It complements build-time API design but primary value is operate-phase infra when you already picked weights and need stable, high-util serving.1installs82SglangThis agent skill is a production deployment playbook for SGLang, the fast structured-generation and LLM serving runtime solo builders use when self-hosting models instead of only calling hosted APIs. It walks through launching `python -m sglang.launch_server` with static memory fractions, binding on `0.0.0.0`, and sizing tensor-parallel groups for Llama-scale weights. You get copy-paste commands for FP8 on H100-class hardware, AWQ and GPTQ INT4 checkpoints, and a minimal NVIDIA CUDA Dockerfile that installs FlashInfer and starts the server as the container entrypoint. It fits indie teams shipping private inference, agent backends, or cost-controlled APIs who already have GPU machines or cloud GPU instances and need repeatable ops commands rather than notebook experiments. Use it when moving from a Hugging Face path to a always-on endpoint, when VRAM forces TP or quantization, or when packaging serving into CI-built images for staging and production.1installs83Simpo TrainingSimpo-training is a procedural guide for solo and indie builders who fine-tune language models with Simple Preference Optimization instead of heavier RLHF stacks. It walks through the exact JSON record shape agents must emit—prompt, chosen, and rejected—with alternative field names the training stack can auto-map. The skill surfaces battle-tested public datasets such as UltraFeedback and cleaned Argilla variants, including approximate pair counts, annotation quality, and when each domain mix fits general instruction following. You get concrete HuggingFace-style dataset_mixer and dataset_splits snippets so your agent can wire train_prefs and test_prefs without guessing filenames. Use it when you already chose SimPO and need trustworthy preference data formatting and sourcing before launching a training job in Ray, Axolotl, or similar runners.1installs84Skypilot Multi Cloud Orchestrationskypilot-multi-cloud-orchestration documents advanced SkyPilot usage for solo ML builders who cannot afford single-cloud lock-in or idle GPUs. The skill encodes YAML recipes for ordered cloud preferences, wildcard regions, Kubernetes-plus-cloud fallback, disk and network tiers, and accelerator-specific instance types such as p4d.24xlarge or H100:8 fleets. It covers production managed jobs with spot instances, FAILOVER spot recovery, restart limits, and controller resource bumps when launching hundreds of jobs. Credential sections point toward IAM roles or long-lived keys so controllers do not expire mid-queue. Invoke it while you are still wiring training in Build, but catalog it on Operate/infra because the payoff is reliable cross-cloud execution and failover—not one-off notebook experiments.1installs85Slime Rl TrainingSlime RL training is an agent skill that translates the slime framework’s API reference into actionable setup for solo builders and small teams running reinforcement learning on language-model agents. It explains how Ray coordinates a data buffer for prompts, filtering, and rollout sample storage with a Megatron-LM training path for the actor (and optional critic) and an SGLang-based rollout path for generation and verification. The skill centers on the Sample object contract—prompts as strings or chat dicts, token lists, responses, and indexing for batched groups—so your coding agent can scaffold configs, trace data flow, and avoid mismatches between training and inference workers. Use it when you are building custom RLHF or verifier-driven loops rather than one-shot fine-tunes, and when you need a map of module boundaries before editing slime source. It does not replace reading upstream slime docs for cluster sizing or hyperparameters, but it gives Prism users a dense orientation to architecture and types for agentic training jobs.1installs86Sparse Autoencoder TrainingSparse Autoencoder Training is an agent skill that functions as a SAELens API reference for solo builders and small research teams working on mechanistic interpretability. It walks through loading pretrained sparse autoencoders from release bundles or HuggingFace, inspecting weights and biases (W_enc, W_dec, b_enc, b_dec), and running encode/decode/forward passes on activation tensors with explicit batch, position, and width dimensions. The material supports training and evaluation workflows where you need sparse features over residual stream hooks rather than end-user product UI. Invoke it when your agent is implementing SAE training scripts, debugging reconstruction error, or wiring feature dashboards on top of transformer activations. Complexity is intermediate to advanced: you should already have PyTorch-style tooling, GPU access when using cuda device strings, and a clear hook naming scheme such as blocks.8.hook_resid_pre. It is phase-specific to Build agent-tooling, with optional overlap into Validate when you are proving whether SAE features explain behaviors before productizing insights.1installs87Speculative DecodingSpeculative-decoding (documented here as lookahead decoding via Jacobi iteration) is a research-oriented agent skill for solo builders and small teams who ship LLM-powered products and need faster token generation without training auxiliary draft models. It walks through how autoregressive decoding can be reframed as equation solving, why exact parallel decoding is impossible, and how disjoint n-grams can still be produced in parallel for acceptance into the final sequence. The material ties to the LMSYS lookahead decoding write-up and ICML 2024 work, with pointers to the Hao AI Lab reference implementation. You use it when you are designing inference stacks, comparing speedup techniques, or explaining tradeoffs to collaborators before committing to speculative or lookahead paths in production agents. It is advanced prose and pseudocode—not a turnkey installer—so it pairs best with your own benchmarking and serving setup.1installs88Stable Diffusion Image GenerationStable Diffusion Image Generation is an agent skill that documents advanced Hugging Face diffusers usage for solo builders shipping AI-powered visuals. It walks through loading Stable Diffusion v1.5 subfolders individually, constructing a StableDiffusionPipeline from UNet, VAE, text encoder, tokenizer, and scheduler, and optionally bypassing the default safety stack when your product policy allows it. A second thread shows how to run a custom denoising path with DDIM scheduling and explicit guidance for pixel dimensions and step counts. Use it when you are past prototyping prompts in a UI and need reproducible Python you can drop into a backend job, CLI, or coding agent session. The material assumes comfort with torch and diffusers imports rather than one-click hosted APIs. It matters for indies who want ownership of inference code, scheduler experiments, and pipeline composition without paying for opaque black-box wrappers.1installs89Systems Paper Writingsystems-paper-writing is a research-oriented agent skill that packages a comprehensive pre-submission checklist for systems conference papers. Indie researchers and small labs shipping serious systems work—not blog posts—use it when a draft nears submission to OSDI, SOSP, ASPLOS, NSDI, or EuroSys and they need a disciplined self-review instead of vague “read it once more” advice. The skill walks through structural completeness: a testable thesis repeated across abstract, introduction, and conclusion; three to five numbered contributions tied to sections and evaluation claims; mandatory section presence from background through related work; and explicit page budgets so design and evaluation stay balanced. It treats evaluation as first-class—end-to-end results, ablations, and scalability—and related work as differentiated grouping rather than a bibliography dump. Tag it multi-phase because ideation and writing start earlier, but the skill’s natural invoke point is Ship review immediately before upload. Pair it with your plotting, benchmarking, and citation skills; it does not replace human peer review or venue-specific formatting tools.1installs90TensorboardTensorboard is an agent skill that teaches solo builders how to integrate TensorBoard with the ML frameworks they already use. It walks through creating a SummaryWriter, logging scalars at batch and epoch granularity, capturing weight histograms, and exporting computation graphs so training behavior is inspectable instead of opaque console prints. The readme structures integrations by ecosystem—PyTorch and torchvision first, then TensorFlow/Keras, Lightning, HuggingFace, Fast.ai, JAX, and scikit-learn—so you can copy patterns that match your repo rather than guessing APIs. For indie ML products and research spikes, that means faster debugging of loss curves, sane experiment folders, and a repeatable habit before you ship models or hand runs to teammates. Use it when you are implementing or refactoring training code and need observability without bolting on a separate experiment platform on day one.1installs91Tensorrt LlmTensorRT-LLM is an agent skill for solo builders and small teams who self-host large language models and need a structured map of NVIDIA TensorRT-LLM parallelism options. It explains when tensor parallelism keeps latency flat on a single node, when pipeline parallelism stacks layers across machines for 175B+ checkpoints, and how combined TP and PP settings affect throughput versus communication cost. The guide walks through concrete Python `LLM(...)` configurations, dtype choices such as fp16 and fp8, and performance expectations so you can match hardware (A100, H100) to model size without guessing shard counts. Use it in Operate when tuning production inference, and in Build when you are wiring the serving layer that will later run under load.1installs92Torchforge Rl Trainingtorchforge-rl-training is an API-reference style agent skill for solo builders and small research teams shipping reinforcement-learning stacks on PyTorch. It explains how torchforge composes Monarch for distributed coordination, TorchTitan for large-model FSDP training, and vLLM for high-throughput rollouts, with your code sitting above ForgeActor and service interfaces. The material helps you reason about where reward models, custom losses, and sampling logic plug in before you commit cluster topology or GPU budgets. It assumes you are past toy scripts and need production-shaped boundaries between trainer, generator, and reference policies. Use it while designing backend training jobs for agentic products, eval harnesses, or research forks—not for one-off fine-tunes in a notebook. Expect advanced familiarity with PyTorch distributed patterns and RL objectives; the skill is reference-heavy rather than a single-command generator.1installs93Training Llms MegatronTraining LLMs with Megatron is a reference skill for builders who need grounded parallelism and throughput numbers before committing cluster budget—not a tutorial that installs Megatron from zero. It centers on Model FLOP Utilization on H100 fleets, explaining why larger models often show higher MFU thanks to heavier GEMMs. Concrete configuration rows cover GPT-3 175B at tensor parallel 4 and pipeline parallel 8, LLaMA-3 variants from 8B through 405B with context parallel for long sequences, and Mixtral-style MoE layouts with expert parallelism. The LLaMA-3.1 405B notes include thousand-GPU scale, average TFlops per GPU, uptime expectations, and efficiency comparisons versus prior generation training. Solo founders use it to sanity-check vendor proposals and to give coding agents realistic TP/PP/CP/EP tuples when drafting launch scripts. Pair it with your own cost model and networking plan; the skill documents benchmarks and tables rather than replacing Megatron-Core source docs or cluster provisioning.1installs94Transformer Lens InterpretabilityTransformerLens interpretability is an agent skill for builders who need mechanistic insight—not just benchmarks—when working with GPT-style and gated LLMs. It centers on HookedTransformer: loading official model names, optional LayerNorm folding and weight centering, precision and device placement, and the tensor shapes behind embeddings, attention, and unembedding. Solo researchers and indie AI product makers use it to trace activations via hooks, compare small and medium checkpoints, and sanity-check behavior before trusting an agent pipeline. The material reads as an API reference slice rather than a full interpretability course; pair it with notebooks and GPU access. It spans Idea-side research questions and Build-side debugging of prompts, safety, and failure modes, without replacing production monitoring stacks.1installs95UnslothUnsloth is an agent skill that maps the official Unsloth documentation into a structured index so solo builders can fine-tune and reinforce open LLMs without drowning in scattered install notes. It foregrounds practical entry points—requirements, pip and Docker setup, Windows paths, updating versions—and the philosophical FAQ on when fine-tuning beats RAG, which matters before you commit weeks of GPU time. For indie agent and SaaS makers, Unsloth is often the fastest path from “prototype works in chat” to “my weights behave on my data” when VRAM is tight. Shelf placement is build/backend because training and RL workflows live there, but invoke it during validate when deciding custom models versus retrieval-only stacks, and during operate when you maintain local training environments. The skill is doc-navigation and procedure knowledge, not a hosted runtime; your agent uses it to pull the right official page or notebook topic before executing commands on your machine.1installs96Verl Rl Trainingverl-rl-training is a compact API reference for the VERL reinforcement-learning stack aimed at builders who already chose VERL and Ray for LLM post-training. It surfaces the core control objects—RayPPOTrainer as the training-loop controller, ResourcePoolManager for GPU allocation with placement groups, and RayWorkerGroup for spawning workers and routing remote calls—so you spend less time hunting class names across the repo. It also covers ActorRolloutRefWorker’s hybrid training versus generation modes and RolloutReplica backend selection (vLLM, SGLang, TensorRT-LLM, HuggingFace) typically driven by YAML rather than hand-written constructors. Use it when your agent is editing configs, debugging worker topology, or explaining how actor, critic, and rollout resources should be partitioned on a cluster. It is reference material, not a full training runbook: you still need datasets, reward design, and cluster ops. Intermediate-to-advanced ML engineers shipping custom agent models benefit most; app-only indie builders can skip it unless they own RL fine-tuning.1installs97Weights And Biasesweights-and-biases is a research-oriented agent skill that teaches solo and indie ML builders how to use Weights & Biases Artifacts and the Model Registry as the system of record for datasets, checkpoints, preprocessing outputs, and evaluation bundles. Instead of losing track of which CSV or torch.save file belonged to which experiment, you log versioned artifacts with descriptions and metadata, attach files or cloud references, and rely on W&B’s deduplicated storage and lineage graph to see which runs consumed or produced each artifact. The guide walks through wandb.init, constructing dataset and model artifacts, logging them from training loops, and applying aliases for production promotion workflows common in small teams shipping fine-tuned models or custom eval harnesses. Prism places it on Build integrations because the first value is wiring the SDK into your codebase, but the same patterns support Ship reproducibility checks and Operate model rollbacks. Agents load it when you ask for artifact versioning, registry aliases, or reproducible ML delivery without rebuilding tribal knowledge from scattered notebooks.1installs98WhisperWhisper is a reference agent skill from the AI research skills collection that explains OpenAI Whisper’s multilingual speech recognition coverage so solo builders do not guess which locales are production-ready. It groups languages by word-error-rate bands: a top tier under ten percent for widely used European and Asian languages, and a good tier between ten and twenty percent for Arabic, Turkish, Vietnamese, Nordic languages, Thai, Hebrew, and others. The readme enumerates all ninety-nine supported languages alphabetically, which helps when designing subtitle pipelines, voice notes, meeting bots, or agent tools that must declare `language` parameters correctly. Use it during Build when picking STT for a global MVP, during Validate when demoing voice in non-English markets, and during Grow when support or content workflows add transcription. It is documentation-heavy rather than a training recipe, optimized for fast agent answers about coverage limits.1installs