Skill Creator

Name: Skill Creator
Author: cognitedata

cognitedata/builder-skills

Run structured blind A/B skill evaluations and post-hoc winner/loser analysis so you can iteratively improve agent SKILL.md packages with evidence.

Overview

Skill Creator is an agent skill most often used in Build (also Ship) that analyzes blind skill comparison results to explain winners and generate concrete improvements for losing skills.

Install

npx skills add https://github.com/cognitedata/builder-skills --skill skill-creator

What is this skill?

Post-hoc analyzer unblinds comparator results to explain why output A beat B
Structured inputs: winner/loser paths, transcripts, and comparator JSON for repeatable audits
Compares SKILL.md structure—clarity, scripts, examples, and edge-case coverage
Produces improvement suggestions actionable for the losing skill author
Fits eval/benchmark loops for skill-creator style quality gates

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 1.4k installs on skills.sh; 4 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You ran two skill variants and know which output won, but not which instructions or gaps caused the loss.

Who is it for?

Teams or solo authors maintaining a skills repo who already run blind comparisons and want systematic post-hoc reviews after each tournament.

Skip if: Greenfield feature coding with no SKILL.md eval artifacts, or builders who only need a single ad-hoc prompt with no comparison pipeline.

When should I use this skill?

A blind comparator has chosen winner A or B and you have paths to both skills, transcripts, and comparison JSON for post-hoc analysis.

What do I get? / Deliverables

You get an unblinded analysis JSON with reasons tied to SKILL.md and transcripts, plus prioritized edits for the losing skill before the next eval run.

Post-hoc analysis JSON saved to the specified output_path
Actionable improvement list for the losing skill's instructions and examples

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildAgent skills & templates

Skill authoring and evaluation are core agent-tooling work in Build—the first place teams formalize how agents behave. Agent-tooling is the canonical shelf for meta workflows that create, compare, and refine skills rather than application features.

Also useful

ShipCode review

Where it fits

Example use

BuildAgent skills & templates

After two planning skills compete blind, unblind results to merge the clearer checklist from the winner into the loser.

Example use

BuildDocs & content

Turn comparator reasoning into changelog bullets for SKILL.md and referenced helper files.

Example use

ShipCode review

Gate a skill version bump by documenting why the new SKILL.md beat the previous tag in a structured analysis file.

Example use

OperateIteration & experiments

When users report bad agent behavior, compare last-known-good vs current skill runs using stored transcripts.

How it compares

Evidence-based skill QA workflow—not a general code generator or an MCP integration server.

Common Questions / FAQ

Who is skill-creator for?

Agent skill authors and platform builders who iterate on SKILL.md quality using blind tests, transcripts, and comparator outputs—not app feature developers by default.

When should I use skill-creator?

After a blind comparison in Build while revising skills; during Ship review when checking regressions between skill versions; during Operate-style iteration when production agent behavior traces back to weak skill instructions.

Is skill-creator safe to install?

It expects local paths to skills and transcripts—review what folders you point the agent at. Confirm pass/fail details on the Security Audits panel on this Prism page before enabling automated eval scripts.

SKILL.md

READMESKILL.md - Skill Creator

# Post-hoc Analyzer Agent

Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions.

## Role

After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved?

## Inputs

You receive these parameters in your prompt:

- **winner**: "A" or "B" (from blind comparison)
- **winner_skill_path**: Path to the skill that produced the winning output
- **winner_transcript_path**: Path to the execution transcript for the winner
- **loser_skill_path**: Path to the skill that produced the losing output
- **loser_transcript_path**: Path to the execution transcript for the loser
- **comparison_result_path**: Path to the blind comparator's output JSON
- **output_path**: Where to save the analysis results

## Process

### Step 1: Read Comparison Result

1. Read the blind comparator's output at comparison_result_path
2. Note the winning side (A or B), the reasoning, and any scores
3. Understand what the comparator valued in the winning output

### Step 2: Read Both Skills

1. Read the winner skill's SKILL.md and key referenced files
2. Read the loser skill's SKILL.md and key referenced files
3. Identify structural differences:
   - Instructions clarity and specificity
   - Script/tool usage patterns
   - Example coverage
   - Edge case handling

### Step 3: Read Both Transcripts

1. Read the winner's transcript
2. Read the loser's transcript
3. Compare execution patterns:
   - How closely did each follow their skill's instructions?
   - What tools were used differently?
   - Where did the loser diverge from optimal behavior?
   - Did either encounter errors or make recovery attempts?

### Step 4: Analyze Instruction Following

For each transcript, evaluate:
- Did the agent follow the skill's explicit instructions?
- Did the agent use the skill's provided tools/scripts?
- Were there missed opportunities to leverage skill content?
- Did the agent add unnecessary steps not in the skill?

Score instruction following 1-10 and note specific issues.

### Step 5: Identify Winner Strengths

Determine what made the winner better:
- Clearer instructions that led to better behavior?
- Better scripts/tools that produced better output?
- More comprehensive examples that guided edge cases?
- Better error handling guidance?

Be specific. Quote from skills/transcripts where relevant.

### Step 6: Identify Loser Weaknesses

Determine what held the loser back:
- Ambiguous instructions that led to suboptimal choices?
- Missing tools/scripts that forced workarounds?
- Gaps in edge case coverage?
- Poor error handling that caused failures?

### Step 7: Generate Improvement Suggestions

Based on the analysis, produce actionable suggestions for improving the loser skill:
- Specific instruction changes to make
- Tools/scripts to add or modify
- Examples to include
- Edge cases to address

Prioritize by impact. Focus on changes that would have changed the outcome.

### Step 8: Write Analysis Results

Save structured analysis to `{output_path}`.

## Output Format

Write a JSON file with this structure:

```json
{
  "comparison_summary": {
    "winner": "A",
    "winner_skill": "path/to/winner/skill",
    "loser_skill": "path/to/loser/skill",
    "comparator_reasoning": "Brief summary of why comparator chose winner"
  },
  "winner_strengths": [
    "Clear step-by-step instructions for handling multi-page documents",
    "Included validation script that caught formatting errors",
    "Explicit guidance on fallback behavior when OCR fails"
  ],
  "loser_weaknesses": [
    "Vague instruction 'process the document appropriately' led to inconsistent behavior",
    "No script for validation, agent had to improvise and made errors",
    "No guidance on OCR failure, agent gave up instead of trying alternatives"
  ],
  "instruction_following": {
    "winner": {

What is this skill?

Post-hoc analyzer unblinds comparator results to explain why output A beat B

Structured inputs: winner/loser paths, transcripts, and comparator JSON for repeatable audits

Compares SKILL.md structure—clarity, scripts, examples, and edge-case coverage

Produces improvement suggestions actionable for the losing skill author

Fits eval/benchmark loops for skill-creator style quality gates

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 1.4k installs on skills.sh; 4 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

BuildAgent skills & templates

Also useful

ShipCode review

Where it fits

Example use

BuildAgent skills & templates

After two planning skills compete blind, unblind results to merge the clearer checklist from the winner into the loser.

Example use

BuildDocs & content

Turn comparator reasoning into changelog bullets for SKILL.md and referenced helper files.

Example use

ShipCode review

Gate a skill version bump by documenting why the new SKILL.md beat the previous tag in a structured analysis file.

Example use

OperateIteration & experiments

When users report bad agent behavior, compare last-known-good vs current skill runs using stored transcripts.

SKILL.md

READMESKILL.md - Skill Creator

# Post-hoc Analyzer Agent

Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions.

## Role

After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved?

## Inputs

You receive these parameters in your prompt:

- **winner**: "A" or "B" (from blind comparison)
- **winner_skill_path**: Path to the skill that produced the winning output
- **winner_transcript_path**: Path to the execution transcript for the winner
- **loser_skill_path**: Path to the skill that produced the losing output
- **loser_transcript_path**: Path to the execution transcript for the loser
- **comparison_result_path**: Path to the blind comparator's output JSON
- **output_path**: Where to save the analysis results

## Process

### Step 1: Read Comparison Result

1. Read the blind comparator's output at comparison_result_path
2. Note the winning side (A or B), the reasoning, and any scores
3. Understand what the comparator valued in the winning output

### Step 2: Read Both Skills

1. Read the winner skill's SKILL.md and key referenced files
2. Read the loser skill's SKILL.md and key referenced files
3. Identify structural differences:
   - Instructions clarity and specificity
   - Script/tool usage patterns
   - Example coverage
   - Edge case handling

### Step 3: Read Both Transcripts

1. Read the winner's transcript
2. Read the loser's transcript
3. Compare execution patterns:
   - How closely did each follow their skill's instructions?
   - What tools were used differently?
   - Where did the loser diverge from optimal behavior?
   - Did either encounter errors or make recovery attempts?

### Step 4: Analyze Instruction Following

For each transcript, evaluate:
- Did the agent follow the skill's explicit instructions?
- Did the agent use the skill's provided tools/scripts?
- Were there missed opportunities to leverage skill content?
- Did the agent add unnecessary steps not in the skill?

Score instruction following 1-10 and note specific issues.

### Step 5: Identify Winner Strengths

Determine what made the winner better:
- Clearer instructions that led to better behavior?
- Better scripts/tools that produced better output?
- More comprehensive examples that guided edge cases?
- Better error handling guidance?

Be specific. Quote from skills/transcripts where relevant.

### Step 6: Identify Loser Weaknesses

Determine what held the loser back:
- Ambiguous instructions that led to suboptimal choices?
- Missing tools/scripts that forced workarounds?
- Gaps in edge case coverage?
- Poor error handling that caused failures?

### Step 7: Generate Improvement Suggestions

Based on the analysis, produce actionable suggestions for improving the loser skill:
- Specific instruction changes to make
- Tools/scripts to add or modify
- Examples to include
- Edge cases to address

Prioritize by impact. Focus on changes that would have changed the outcome.

### Step 8: Write Analysis Results

Save structured analysis to `{output_path}`.

## Output Format

Write a JSON file with this structure:

```json
{
  "comparison_summary": {
    "winner": "A",
    "winner_skill": "path/to/winner/skill",
    "loser_skill": "path/to/loser/skill",
    "comparator_reasoning": "Brief summary of why comparator chose winner"
  },
  "winner_strengths": [
    "Clear step-by-step instructions for handling multi-page documents",
    "Included validation script that caught formatting errors",
    "Explicit guidance on fallback behavior when OCR fails"
  ],
  "loser_weaknesses": [
    "Vague instruction 'process the document appropriately' led to inconsistent behavior",
    "No script for validation, agent had to improvise and made errors",
    "No guidance on OCR failure, agent gave up instead of trying alternatives"
  ],
  "instruction_following": {
    "winner": {

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is skill-creator for?

When should I use skill-creator?

Is skill-creator safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is skill-creator for?

When should I use skill-creator?

Is skill-creator safe to install?

SKILL.md