Skill Creator

Builders authoring skills live in Build → agent-tooling as the primary shelf for meta tooling that extends Claude Code-style workflows. Post-hoc analysis and skill iteration are core agent-tooling work, not generic app frontend or ship-time QA.

Also useful

Also useful

Where it fits

Example use

After comparing two onboarding skills, unblind results and edit the loser’s SKILL.md checklist.

Example use

ValidatePrototype & spike

Before publishing a skill update, verify the winning variant’s instructions match what the comparator rewarded.

Example use

Prototype two skill drafts for the same task and use analysis output to pick which to productize.

Example use

Monthly skill maintenance: rerun comparisons when model behavior drifts and feed analyzer notes into the next version.

How it compares

Meta eval workflow for skill packages—not a general code reviewer or a skills.sh install counter.

Common Questions / FAQ

Who is skill-creator for?

Indie builders and maintainers of Claude Code skills who run blind comparisons and need systematic post-hoc diagnosis, not guesswork.

When should I use skill-creator?

Use it in Build → agent-tooling after a blind comparator finishes; also when refining skills discovered in Validate → prototype or when hardening skills before Ship → review.

Is skill-creator safe to install?

Check the Security Audits panel on this page; the analyzer reads local skill paths and transcripts you supply—avoid pointing it at secrets you do not want in logs.

SKILL.md

READMESKILL.md - Skill Creator

# Security scan marker file (generated by security_scan.py)
.security-scan-passed


# Post-hoc Analyzer Agent

Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions.

## Role

After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved?

## Inputs

You receive these parameters in your prompt:

- **winner**: "A" or "B" (from blind comparison)
- **winner_skill_path**: Path to the skill that produced the winning output
- **winner_transcript_path**: Path to the execution transcript for the winner
- **loser_skill_path**: Path to the skill that produced the losing output
- **loser_transcript_path**: Path to the execution transcript for the loser
- **comparison_result_path**: Path to the blind comparator's output JSON
- **output_path**: Where to save the analysis results

## Process

### Step 1: Read Comparison Result

1. Read the blind comparator's output at comparison_result_path
2. Note the winning side (A or B), the reasoning, and any scores
3. Understand what the comparator valued in the winning output

### Step 2: Read Both Skills

1. Read the winner skill's SKILL.md and key referenced files
2. Read the loser skill's SKILL.md and key referenced files
3. Identify structural differences:
   - Instructions clarity and specificity
   - Script/tool usage patterns
   - Example coverage
   - Edge case handling

### Step 3: Read Both Transcripts

1. Read the winner's transcript
2. Read the loser's transcript
3. Compare execution patterns:
   - How closely did each follow their skill's instructions?
   - What tools were used differently?
   - Where did the loser diverge from optimal behavior?
   - Did either encounter errors or make recovery attempts?

### Step 4: Analyze Instruction Following

For each transcript, evaluate:
- Did the agent follow the skill's explicit instructions?
- Did the agent use the skill's provided tools/scripts?
- Were there missed opportunities to leverage skill content?
- Did the agent add unnecessary steps not in the skill?

Score instruction following 1-10 and note specific issues.

### Step 5: Identify Winner Strengths

Determine what made the winner better:
- Clearer instructions that led to better behavior?
- Better scripts/tools that produced better output?
- More comprehensive examples that guided edge cases?
- Better error handling guidance?

Be specific. Quote from skills/transcripts where relevant.

### Step 6: Identify Loser Weaknesses

Determine what held the loser back:
- Ambiguous instructions that led to suboptimal choices?
- Missing tools/scripts that forced workarounds?
- Gaps in edge case coverage?
- Poor error handling that caused failures?

### Step 7: Generate Improvement Suggestions

Based on the analysis, produce actionable suggestions for improving the loser skill:
- Specific instruction changes to make
- Tools/scripts to add or modify
- Examples to include
- Edge cases to address

Prioritize by impact. Focus on changes that would have changed the outcome.

### Step 8: Write Analysis Results

Save structured analysis to `{output_path}`.

## Output Format

Write a JSON file with this structure:

```json
{
  "comparison_summary": {
    "winner": "A",
    "winner_skill": "path/to/winner/skill",
    "loser_skill": "path/to/loser/skill",
    "comparator_reasoning": "Brief summary of why comparator chose winner"
  },
  "winner_strengths": [
    "Clear step-by-step instructions for handling multi-page documents",
    "Included validation script that caught formatting errors",
    "Explicit guidance on fallback behavior when OCR fails"
  ],
  "loser_weaknesses": [
    "Vague instruction 'process the document appropriately' led to inconsistent behavior",
    "No script for validation, agent had to improvise and made errors",
    "No guidance on OCR failure, agent gave up

What is this skill?

Post-hoc Analyzer unblinds comparator results to explain why skill A beat skill B

Reads winner and loser SKILL.md plus execution transcripts for structural diffs

Produces actionable improvement suggestions for the losing skill package

Designed for blind comparison pipelines with explicit winner/loser paths and JSON comparator output

Fits skill authors running systematic evals rather than one-off prompt tweaks

Post-hoc workflow uses explicit winner/loser skill paths and comparison_result_path JSON inputs

Compatible agents: Claude Code, Cursor, Codex

Adoption & trust: 593 installs on skills.sh; 1.2k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

After comparing two onboarding skills, unblind results and edit the loser’s SKILL.md checklist.

Example use

ValidatePrototype & spike

Before publishing a skill update, verify the winning variant’s instructions match what the comparator rewarded.

Example use

Prototype two skill drafts for the same task and use analysis output to pick which to productize.

Example use