
Skill Creator
Evaluate and improve custom agent skills by analyzing blind A/B runs and extracting why the winner performed better.
Overview
Skill-creator is an agent skill most often used in Build (also Ship review, Operate iterate) that analyzes blind skill comparisons and recommends how to improve the losing SKILL.md.
Install
npx skills add https://github.com/daymade/claude-code-skills --skill skill-creatorWhat is this skill?
- Post-hoc Analyzer unblinds comparator results to explain why skill A beat skill B
- Reads winner and loser SKILL.md plus execution transcripts for structural diffs
- Produces actionable improvement suggestions for the losing skill package
- Designed for blind comparison pipelines with explicit winner/loser paths and JSON comparator output
- Fits skill authors running systematic evals rather than one-off prompt tweaks
- Post-hoc workflow uses explicit winner/loser skill paths and comparison_result_path JSON inputs
Adoption & trust: 593 installs on skills.sh; 1.2k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You ran two skill variants in a blind test but only know which won—not what to change in the loser’s instructions.
Who is it for?
Solo skill authors iterating on SKILL.md packages with A/B runs and JSON comparator outputs.
Skip if: Builders who only want end-user app features and never author or version agent skills.
When should I use this skill?
A blind comparator has already chosen winner A or B and you need actionable reasons and fixes for the losing skill.
What do I get? / Deliverables
You get a structured post-hoc analysis tying comparator reasoning to skill differences and concrete improvement suggestions saved to your output path.
- Post-hoc analysis JSON at output_path
- Improvement suggestions for the losing skill
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Builders authoring skills live in Build → agent-tooling as the primary shelf for meta tooling that extends Claude Code-style workflows. Post-hoc analysis and skill iteration are core agent-tooling work, not generic app frontend or ship-time QA.
Where it fits
After comparing two onboarding skills, unblind results and edit the loser’s SKILL.md checklist.
Before publishing a skill update, verify the winning variant’s instructions match what the comparator rewarded.
Prototype two skill drafts for the same task and use analysis output to pick which to productize.
Monthly skill maintenance: rerun comparisons when model behavior drifts and feed analyzer notes into the next version.
How it compares
Meta eval workflow for skill packages—not a general code reviewer or a skills.sh install counter.
Common Questions / FAQ
Who is skill-creator for?
Indie builders and maintainers of Claude Code skills who run blind comparisons and need systematic post-hoc diagnosis, not guesswork.
When should I use skill-creator?
Use it in Build → agent-tooling after a blind comparator finishes; also when refining skills discovered in Validate → prototype or when hardening skills before Ship → review.
Is skill-creator safe to install?
Check the Security Audits panel on this page; the analyzer reads local skill paths and transcripts you supply—avoid pointing it at secrets you do not want in logs.
SKILL.md
READMESKILL.md - Skill Creator
# Security scan marker file (generated by security_scan.py) .security-scan-passed # Post-hoc Analyzer Agent Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions. ## Role After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved? ## Inputs You receive these parameters in your prompt: - **winner**: "A" or "B" (from blind comparison) - **winner_skill_path**: Path to the skill that produced the winning output - **winner_transcript_path**: Path to the execution transcript for the winner - **loser_skill_path**: Path to the skill that produced the losing output - **loser_transcript_path**: Path to the execution transcript for the loser - **comparison_result_path**: Path to the blind comparator's output JSON - **output_path**: Where to save the analysis results ## Process ### Step 1: Read Comparison Result 1. Read the blind comparator's output at comparison_result_path 2. Note the winning side (A or B), the reasoning, and any scores 3. Understand what the comparator valued in the winning output ### Step 2: Read Both Skills 1. Read the winner skill's SKILL.md and key referenced files 2. Read the loser skill's SKILL.md and key referenced files 3. Identify structural differences: - Instructions clarity and specificity - Script/tool usage patterns - Example coverage - Edge case handling ### Step 3: Read Both Transcripts 1. Read the winner's transcript 2. Read the loser's transcript 3. Compare execution patterns: - How closely did each follow their skill's instructions? - What tools were used differently? - Where did the loser diverge from optimal behavior? - Did either encounter errors or make recovery attempts? ### Step 4: Analyze Instruction Following For each transcript, evaluate: - Did the agent follow the skill's explicit instructions? - Did the agent use the skill's provided tools/scripts? - Were there missed opportunities to leverage skill content? - Did the agent add unnecessary steps not in the skill? Score instruction following 1-10 and note specific issues. ### Step 5: Identify Winner Strengths Determine what made the winner better: - Clearer instructions that led to better behavior? - Better scripts/tools that produced better output? - More comprehensive examples that guided edge cases? - Better error handling guidance? Be specific. Quote from skills/transcripts where relevant. ### Step 6: Identify Loser Weaknesses Determine what held the loser back: - Ambiguous instructions that led to suboptimal choices? - Missing tools/scripts that forced workarounds? - Gaps in edge case coverage? - Poor error handling that caused failures? ### Step 7: Generate Improvement Suggestions Based on the analysis, produce actionable suggestions for improving the loser skill: - Specific instruction changes to make - Tools/scripts to add or modify - Examples to include - Edge cases to address Prioritize by impact. Focus on changes that would have changed the outcome. ### Step 8: Write Analysis Results Save structured analysis to `{output_path}`. ## Output Format Write a JSON file with this structure: ```json { "comparison_summary": { "winner": "A", "winner_skill": "path/to/winner/skill", "loser_skill": "path/to/loser/skill", "comparator_reasoning": "Brief summary of why comparator chose winner" }, "winner_strengths": [ "Clear step-by-step instructions for handling multi-page documents", "Included validation script that caught formatting errors", "Explicit guidance on fallback behavior when OCR fails" ], "loser_weaknesses": [ "Vague instruction 'process the document appropriately' led to inconsistent behavior", "No script for validation, agent had to improvise and made errors", "No guidance on OCR failure, agent gave up