
Torchforge Rl Training
Stand up asynchronous distributed RL training that couples TorchTitan FSDP, vLLM generation, and Monarch-coordinated Forge actors.
Overview
torchforge-rl-training is an agent skill for the Build phase that documents how to orchestrate asynchronous RL with torchforge, Monarch, TorchTitan, and vLLM.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill torchforge-rl-trainingWhat is this skill?
- Maps torchforge stack: Monarch coordination, TorchTitan training, vLLM inference
- Documents ForgeActor base class and async Forge API service layer
- Covers TitanTrainer, Generator, and frozen ReferenceModel for KL baselines
- Oriented to fully asynchronous RL application code (rewards, loss, sampling)
- Architecture diagram ties application layer to distributed Monarch services
- Three core distributed services: TitanTrainer, Generator, ReferenceModel
- Stack layers: Application, Forge API, Monarch distributed services
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You know you want RL on an LLM but lack a clear map of which torchforge actors and services own training versus generation versus KL reference.
Who is it for?
Indie ML builders prototyping GRPO/PPO-style LLM RL with Meta-style training plus vLLM rollouts on a small cluster or serious workstation setup.
Skip if: Beginners who only need LoRA fine-tuning in a notebook or teams not running distributed PyTorch and inference services.
When should I use this skill?
User is implementing or debugging torchforge-based asynchronous RL with ForgeActor, TorchTitan training, or vLLM generation services.
What do I get? / Deliverables
You can place custom reward and loss code on the application layer and wire it to ForgeActor-backed Trainer, Generator, and ReferenceModel services.
- ForgeActor-based application modules for rewards and sampling
- Service topology plan for Trainer, Generator, and ReferenceModel
- Implementation notes aligned to forge.controller.actor APIs
Recommended Skills
Journey fit
Training and orchestrating RL loops is backend ML infrastructure work you do while building an agent or model product, not a launch or growth task. backend fits because the skill documents ForgeActor, distributed Trainer/Generator/ReferenceModel services—not UI or distribution.
How it compares
Reference skill for torchforge architecture, not a one-shot Hugging Face Trainer recipe or an MCP tool integration.
Common Questions / FAQ
Who is torchforge-rl-training for?
Developers building custom RL training pipelines on LLMs who need torchforge, Monarch, TorchTitan, and vLLM roles spelled out for agent-assisted implementation.
When should I use torchforge-rl-training?
In Build backend planning when you are wiring async RL services, and again in Ship testing when you validate trainer–generator separation before long GPU jobs.
Is torchforge-rl-training safe to install?
It is documentation-first; any training code you add will need GPUs, network, and secrets—check the Security Audits panel on this page before running cluster jobs from an agent.
SKILL.md
READMESKILL.md - Torchforge Rl Training
# torchforge API Reference ## Architecture Overview torchforge implements a fully asynchronous RL system built on: - **Monarch**: PyTorch-native distributed coordination framework - **TorchTitan**: Meta's production LLM training platform - **vLLM**: High-throughput inference engine ``` ┌─────────────────────────────────────────────────────────┐ │ Application Layer (Your Code) │ │ - Define reward models, loss functions, sampling │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Forge API Layer │ │ - ForgeActor, Service │ │ - Async service interfaces │ └─────────────────────┬───────────────────────────────────┘ │ ┌─────────────────────▼───────────────────────────────────┐ │ Distributed Services (Monarch) │ │ ├── TitanTrainer (TorchTitan FSDP) │ │ ├── Generator (vLLM inference) │ │ └── ReferenceModel (frozen KL baseline) │ └─────────────────────────────────────────────────────────┘ ``` ## Core Classes ### ForgeActor Base class for Forge actors with configurable resource attributes. **Location**: `forge.controller.actor.ForgeActor` ```python from forge.controller.actor import ForgeActor class MyActor(ForgeActor): procs = 1 # Number of processes hosts = None # Host distribution with_gpus = True # GPU allocation flag num_replicas = 1 # Service replica count mesh_name = None # Process mesh identifier ``` **Class Methods**: - `as_actor(*args, **actor_kwargs)` → Spawns single actor using .options() configuration - `launch(*args, **kwargs)` → Provisions and deploys new actor replica - `options(*, procs=1, hosts=None, with_gpus=False, num_replicas=1, mesh_name=None, **kwargs)` → Pre-configures actor class - `shutdown(actor)` → Terminates actor instance ### TitanTrainer Generic trainer actor built on TorchTitan's training engine. **Location**: `forge.actors.trainer.TitanTrainer` **Key Methods**: - `forward_backward(batch)` → Forward and backward pass - `train_step()` → Complete training step - `setup()` / `cleanup()` → Lifecycle methods - `clear_gradients()` → Reset gradients - `save()` / `load()` → Checkpoint operations - `push_weights()` → Sync weights to inference - `get_config()` / `get_status()` → Introspection **Properties**: `job`, `model`, `optimizer`, `lr_scheduler`, `training`, `parallelism`, `checkpoint`, `activation_checkpoint`, `compile`, `quantize`, `comm`, `memory_estimation`, `state_dict_key` ### Generator vLLM-based generator for inference. **Location**: `forge.actors.generator.Generator` ```python from forge.actors.generator import Generator generator = Generator( engine_args=<factory>, sampling_params=<factory>, prefetch_weights_to_shm=True, n_fetcher_procs=8 ) ``` **Key Methods**: - `generate()` → Generate completions - `run()` → Async generation loop - `update_weights()` → Receive new weights from trainer - `get_version()` / `get_vllm_config()` → Introspection **Returns**: `Completion` dataclass with fields: `prompt`, `text`, `token_ids`, `logprobs` ### ReferenceModel Frozen policy copy for computing KL divergence. **Location**: `forge.actors.reference_model.ReferenceModel` Maintains a frozen copy of the policy for computing advantages without gradient computation. **Key Methods**: - `forward()` → Inference without gradients - `setup()` → Initialize from checkpoint ### Service Actor-less service implementation for managing replicas. **Location**: `forge.controller.service.service.Service` ```python Service(cfg, actor_def, actor_args, actor_kwargs) ``` **Methods**: - `call_all(function, *args, **kwargs)` → Call function on all healthy replicas - `get_metrics()` → Returns ServiceMetrics objec