
Simpo Training
Prepare and configure preference datasets in the correct JSON schema before running SimPO fine-tuning on instruction-following models.
Overview
Simpo-training is an agent skill for the Build phase that documents preference dataset formats and curated sources for SimPO LLM fine-tuning.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill simpo-trainingWhat is this skill?
- Documents required prompt/chosen/rejected fields plus auto-detected alias column names
- Compares UltraFeedback, cleaned Argilla, and other popular preference corpora with sizes and quality notes
- Ready-to-paste dataset_mixer YAML for HuggingFaceH4/ultrafeedback_binarized
- Explains what makes a strong chosen vs rejected pair for preference optimization
- 60K preference pairs in UltraFeedback binarized
- 50K filtered pairs in cleaned Argilla UltraFeedback
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want SimPO training but lack a validated schema and vetted preference corpora, so runs fail or learn from noisy pairs.
Who is it for?
Indie builders shipping a custom assistant who already picked SimPO and need HF-ready preference data choices fast.
Skip if: Teams still picking between DPO, PPO, or SimPO at the architecture stage—decide the algorithm elsewhere first.
When should I use this skill?
You are configuring SimPO or similar preference optimization training and need dataset schema plus recommended corpora.
What do I get? / Deliverables
After the skill runs, you have field mappings, example records, and dataset_mixer configs ready to plug into your SimPO training pipeline.
- Validated preference JSON examples
- dataset_mixer configuration block
- Dataset shortlist with tradeoffs
Recommended Skills
Journey fit
SimPO training is core product work once you are building or customizing an LLM—not idea validation or launch distribution. Dataset mixing, HuggingFace configs, and train/test splits sit with backend ML pipelines rather than UI or docs.
How it compares
Dataset curation guide for SimPO, not a generic HuggingFace Datasets browser skill.
Common Questions / FAQ
Who is simpo-training for?
Solo builders and small teams fine-tuning LLMs with SimPO who need correct preference JSON and sensible default datasets.
When should I use simpo-training?
During Build when configuring train_prefs splits, mixing UltraFeedback-style corpora, or validating chosen/rejected quality before a training job.
Is simpo-training safe to install?
It is documentation-only procedural knowledge; review the Security Audits panel on this Prism page before trusting the parent skill repo in your agent.
SKILL.md
READMESKILL.md - Simpo Training
# Datasets Complete guide to preference datasets for SimPO training. ## Dataset Format ### Required Fields Preference datasets must contain: ```json { "prompt": "User question or instruction", "chosen": "Better/preferred response", "rejected": "Worse/rejected response" } ``` **Alternative field names** (auto-detected): - `prompt` → `question`, `instruction`, `input` - `chosen` → `response_chosen`, `winner`, `preferred` - `rejected` → `response_rejected`, `loser` ### Example Entry ```json { "prompt": "Explain quantum computing in simple terms.", "chosen": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously through superposition. This allows quantum computers to process many possibilities at once, making them potentially much faster than classical computers for specific tasks like cryptography and optimization.", "rejected": "It's like regular computing but quantum." } ``` ## Popular Datasets ### 1. UltraFeedback (Recommended) **HuggingFaceH4/ultrafeedback_binarized**: - **Size**: 60K preference pairs - **Quality**: High (GPT-4 annotations) - **Domain**: General instruction following - **Format**: Clean, ready-to-use **Config**: ```yaml dataset_mixer: HuggingFaceH4/ultrafeedback_binarized: 1.0 dataset_splits: - train_prefs - test_prefs ``` ### 2. Argilla UltraFeedback (Cleaned) **argilla/ultrafeedback-binarized-preferences-cleaned**: - **Size**: 50K pairs (filtered) - **Quality**: Very high (deduped, cleaned) - **Domain**: General - **Format**: Clean **Config**: ```yaml dataset_mixer: argilla/ultrafeedback-binarized-preferences-cleaned: 1.0 ``` ### 3. Distilabel Math **argilla/distilabel-math-preference-dpo**: - **Size**: 30K pairs - **Quality**: High (GSM8K, MATH) - **Domain**: Math reasoning - **Format**: Math-specific **Config**: ```yaml dataset_mixer: argilla/distilabel-math-preference-dpo: 1.0 ``` ### 4. HelpSteer **nvidia/HelpSteer**: - **Size**: 38K samples - **Quality**: High (human ratings) - **Domain**: Helpfulness alignment - **Format**: Multi-attribute ratings **Config**: ```yaml dataset_mixer: nvidia/HelpSteer: 1.0 ``` ### 5. Anthropic HH-RLHF **Anthropic/hh-rlhf**: - **Size**: 161K samples - **Quality**: High (human preferences) - **Domain**: Harmless + helpful - **Format**: Conversational **Config**: ```yaml dataset_mixer: Anthropic/hh-rlhf: 1.0 ``` ## Dataset Mixing ### Multiple Datasets **Equal mix**: ```yaml dataset_mixer: HuggingFaceH4/ultrafeedback_binarized: 0.5 Anthropic/hh-rlhf: 0.5 ``` **Weighted mix**: ```yaml dataset_mixer: HuggingFaceH4/ultrafeedback_binarized: 0.7 argilla/distilabel-math-preference-dpo: 0.2 nvidia/HelpSteer: 0.1 ``` **Domain-specific emphasis**: ```yaml # 80% general + 20% math dataset_mixer: HuggingFaceH4/ultrafeedback_binarized: 0.8 argilla/distilabel-math-preference-dpo: 0.2 ``` ## Data Quality ### Quality Indicators **Good preference data**: - ✅ Clear quality difference between chosen/rejected - ✅ Diverse prompts - ✅ Minimal noise/annotation errors - ✅ Appropriate difficulty level **Poor preference data**: - ❌ Ambiguous preferences - ❌ Repetitive prompts - ❌ Annotation noise - ❌ Too easy/hard prompts ### Quality Filtering **Filter by length difference**: ```python def filter_by_length(example): chosen_len = len(example['chosen'].split()) rejected_len = len(example['rejected'].split()) # Reject if chosen is much shorter (potential low-effort) return chosen_len >= rejected_len * 0.5 dataset = dataset.filter(filter_by_length) ``` **Filter by diversity**: ```python seen_prompts = set() def filter_duplicates(example): prompt = example['prompt'] if prompt in seen_prompts: return False seen_prompts.add(prompt) return True dataset = dataset.filter(filter_duplicates) ``` ## Custom Dataset Creation ### Format 1: JSON Lines **File** (`preferences.jsonl`): ```jsonl {"prompt": "What is Python?", "chosen": "Python is a high-level pro