
Skill Creator
After blind A/B skill runs, unblind transcripts and SKILL.md files to learn why the winner won and how to improve the loser.
Overview
Skill Creator is an agent skill most often used in Build agent-tooling (also Ship review) that analyzes blind skill comparison results to explain winners and propose loser improvements.
Install
npx skills add https://github.com/bytedance/deer-flow --skill skill-creatorWhat is this skill?
- Consumes blind comparator JSON with winner side, reasoning, and scores
- Reads winner and loser SKILL.md plus key referenced files for structural diffs
- Compares instruction clarity, scripts, examples, and edge-case handling
- Uses execution transcripts for both sides to explain behavioral differences
- Writes actionable improvement suggestions to a specified analysis output path
- 3-step process: read comparison JSON, read both skills, read both transcripts
Adoption & trust: 1.4k installs on skills.sh; 70.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You ran blind skill comparisons but only know which side won—not which instructions or tooling choices actually drove the outcome.
Who is it for?
Solo builders iterating agent skills who already have blind comparison artifacts and want evidence-backed edit lists.
Skip if: First-time skill authors with no baseline SKILL.md or teams not running paired skill evaluations.
When should I use this skill?
After blind comparator returns winner A or B and you have paths to winner/loser skills, transcripts, comparison_result_path, and output_path.
What do I get? / Deliverables
You get a structured post-hoc analysis linking comparator reasoning to skill diffs and transcript behavior, saved to your output path for the next SKILL.md revision.
- Post-hoc analysis JSON or markdown at output_path
- Actionable improvement suggestions for the losing skill
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Skill authoring and evaluation live under build agent-tooling where you craft and benchmark reusable agent capabilities. Post-hoc comparison analysis is meta-work on skills themselves, not application feature code.
Where it fits
Unblind a Deer Flow comparison and draft concrete edits to the losing brainstorming skill.
Before tagging a skill release, verify regression reasons from the latest blind run.
When production agent behavior drifts, compare archived skill versions using stored comparator outputs.
How it compares
Complements blind A/B harnesses—analysis only, not a substitute for writing or executing skills from scratch.
Common Questions / FAQ
Who is skill-creator for?
Indie agent-skill maintainers who benchmark variants and need post-hoc explanations tied to SKILL.md structure and run transcripts.
When should I use skill-creator?
After Build agent-tooling eval runs complete, or during Ship review when hardening skills before publishing to a catalog or repo.
Is skill-creator safe to install?
It reads local skill folders, transcripts, and comparison JSON you provide; review the Security Audits panel on this Prism page and avoid pointing it at secrets in transcripts.
SKILL.md
READMESKILL.md - Skill Creator
# Post-hoc Analyzer Agent Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions. ## Role After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved? ## Inputs You receive these parameters in your prompt: - **winner**: "A" or "B" (from blind comparison) - **winner_skill_path**: Path to the skill that produced the winning output - **winner_transcript_path**: Path to the execution transcript for the winner - **loser_skill_path**: Path to the skill that produced the losing output - **loser_transcript_path**: Path to the execution transcript for the loser - **comparison_result_path**: Path to the blind comparator's output JSON - **output_path**: Where to save the analysis results ## Process ### Step 1: Read Comparison Result 1. Read the blind comparator's output at comparison_result_path 2. Note the winning side (A or B), the reasoning, and any scores 3. Understand what the comparator valued in the winning output ### Step 2: Read Both Skills 1. Read the winner skill's SKILL.md and key referenced files 2. Read the loser skill's SKILL.md and key referenced files 3. Identify structural differences: - Instructions clarity and specificity - Script/tool usage patterns - Example coverage - Edge case handling ### Step 3: Read Both Transcripts 1. Read the winner's transcript 2. Read the loser's transcript 3. Compare execution patterns: - How closely did each follow their skill's instructions? - What tools were used differently? - Where did the loser diverge from optimal behavior? - Did either encounter errors or make recovery attempts? ### Step 4: Analyze Instruction Following For each transcript, evaluate: - Did the agent follow the skill's explicit instructions? - Did the agent use the skill's provided tools/scripts? - Were there missed opportunities to leverage skill content? - Did the agent add unnecessary steps not in the skill? Score instruction following 1-10 and note specific issues. ### Step 5: Identify Winner Strengths Determine what made the winner better: - Clearer instructions that led to better behavior? - Better scripts/tools that produced better output? - More comprehensive examples that guided edge cases? - Better error handling guidance? Be specific. Quote from skills/transcripts where relevant. ### Step 6: Identify Loser Weaknesses Determine what held the loser back: - Ambiguous instructions that led to suboptimal choices? - Missing tools/scripts that forced workarounds? - Gaps in edge case coverage? - Poor error handling that caused failures? ### Step 7: Generate Improvement Suggestions Based on the analysis, produce actionable suggestions for improving the loser skill: - Specific instruction changes to make - Tools/scripts to add or modify - Examples to include - Edge cases to address Prioritize by impact. Focus on changes that would have changed the outcome. ### Step 8: Write Analysis Results Save structured analysis to `{output_path}`. ## Output Format Write a JSON file with this structure: ```json { "comparison_summary": { "winner": "A", "winner_skill": "path/to/winner/skill", "loser_skill": "path/to/loser/skill", "comparator_reasoning": "Brief summary of why comparator chose winner" }, "winner_strengths": [ "Clear step-by-step instructions for handling multi-page documents", "Included validation script that caught formatting errors", "Explicit guidance on fallback behavior when OCR fails" ], "loser_weaknesses": [ "Vague instruction 'process the document appropriately' led to inconsistent behavior", "No script for validation, agent had to improvise and made errors", "No guidance on OCR failure, agent gave up instead of trying alternatives" ], "instruction_following": { "winner": {