
Slime Rl Training
Configure Ray-orchestrated RL training loops with Megatron-LM actors, SGLang rollouts, and slime Sample buffers when fine-tuning agent policies.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill slime-rl-trainingWhat is this skill?
- Documents slime’s three-module Ray layout: data buffer, Megatron-LM training, and SGLang rollout with router
- Defines the core Sample dataclass fields (prompt, tokens, response, group_index) from slime.utils.types
- Covers actor training, optional critic, and weight sync from training into rollout workers
- Explains rollout generation, reward/verifier outputs, and multi-turn response support
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Reinforcement-learning stack wiring sits in Build because you implement training pipelines and agent rollouts before production ship gates. Agent-tooling is the canonical shelf for slime’s three-module Ray architecture (data buffer, training, rollout) and Sample-type contracts.
Common Questions / FAQ
Is Slime Rl Training safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Slime Rl Training
# slime API Reference ## Architecture Overview slime operates with a three-module architecture orchestrated by Ray: ``` ┌─────────────────────────────────────────────────────────┐ │ Data Buffer │ │ - Prompt initialization and management │ │ - Custom data generation and filtering │ │ - Rollout sample storage │ └─────────────┬───────────────────────────┬───────────────┘ │ │ ┌─────────────▼───────────┐ ┌─────────────▼───────────────┐ │ Training (Megatron-LM) │ │ Rollout (SGLang + Router) │ │ - Actor model training │ │ - Response generation │ │ - Critic (optional) │ │ - Reward/verifier output │ │ - Weight sync to rollout│ │ - Multi-turn support │ └─────────────────────────┘ └─────────────────────────────┘ ``` ## Core Data Structures ### Sample Object The `Sample` object is the core data structure defined in `slime/utils/types.py`: ```python from slime.utils.types import Sample @dataclass class Sample: # Core fields group_index: Optional[int] # Group index for batching index: Optional[int] # Sample index prompt: str | list[dict] = "" # Input prompt or chat history tokens: list[int] = field(default_factory=list) # Token IDs response: str = "" # Generated response response_length: int = 0 # Response length in tokens label: Optional[str] = None # Ground truth label reward: Optional[float | dict] = None # RL reward signal loss_mask: Optional[list[int]] = None # 1=compute loss, 0=mask status: Status = Status.PENDING # Sample status metadata: dict = field(default_factory=dict) # Custom data # Multimodal support multimodal_inputs: Optional[Any] = None # Raw multimodal data (images, videos) multimodal_train_inputs: Optional[Any] = None # Processed multimodal data (pixel_values) # Rollout tracking weight_versions: list[str] = field(default_factory=list) rollout_log_probs: Optional[list[float]] = None # Log probs from SGLang rollout_routed_experts: Optional[list[list[int]]] = None # Expert routing (MoE) # Control fields remove_sample: bool = False generate_function_path: Optional[str] = None train_metadata: Optional[dict] = None non_generation_time: float = 0.0 # Speculative decoding info (nested dataclass) @dataclass class SpecInfo: spec_accept_token_num: int = 0 spec_draft_token_num: int = 0 spec_verify_ct: int = 0 completion_token_num: int = 0 ``` ### Status Enum ```python class Status(Enum): PENDING = "pending" # Not yet processed COMPLETED = "completed" # Successfully generated TRUNCATED = "truncated" # Hit max length ABORTED = "aborted" # Failed generation FAILED = "failed" # Generation failed ``` ## Configuration System slime uses three categories of command-line arguments: ### 1. Megatron Arguments All Megatron-LM arguments are supported directly: ```bash --tensor-model-parallel-size 2 --pipeline-model-parallel-size 1 --num-layers 32 --hidden-size 4096 --num-attention-heads 32 --seq-length 4096 --micro-batch-size 1 --global-batch-size 256 ``` ### 2. SGLang Arguments SGLang arguments are prefixed with `--sglang-`: ```bash --sglang-mem-fraction-static 0.8 # GPU memory for KV cache --sglang-context-length 8192 # Maximum context length --sglang-log-level INFO # Logging verbosity --sglang-tp-size 2 # Tensor parallelism --sglang-disable-cuda-graph # Disable CUDA graphs ``` ### 3. slime-Specific Arguments Defined in `slime/utils/arguments.py`: ```bash # Resource Allocation --actor-num-nodes 1 # Training nodes --actor-num-gpus-per-node 8 # GPUs per training node --rollout-