Improve Skill

Name: Improve Skill
Author: gupsammy

gupsammy/claudest

Run a structured effectiveness analysis on an existing agent skill to find stuck points, divergences, and dead ends—then improve SKILL.md quality.

Install

npx skills add https://github.com/gupsammy/claudest --skill improve-skill

What is this skill?

Multi-phase analysis format: purpose statement, mental simulation, stuck/divergence/dead-end findings
Calibrates depth using sample runs (e.g., create-skill) so reviews stay consistent
Surfaces undefined scoring criteria and ambiguous subagent doc-fetch behavior
Identifies missing post-delivery paths such as repair-skill after create-skill
Eval-oriented: compares expected agent behavior against representative user requests

Adoption & trust: 1 installs on skills.sh; 253 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Primary fit

BuildAgent skills & templates

Skill authoring and refinement live under build/agent-tooling as the canonical shelf, even though you can run analysis whenever a skill underperforms. Agent-tooling covers procedural SKILL.md quality, evaluation rubrics, and handoffs to repair or create workflows—not app feature code.

Common Questions / FAQ

Is Improve Skill safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Improve Skill

# Sample Analysis: create-skill

This is a complete example of an improve-skill effectiveness analysis run against
`create-skill`. Use it to calibrate the expected depth and format.

---

## Skill Purpose (stated in Phase 1)

`create-skill` generates new Claude Code skills or commands from scratch. A user provides
requirements and receives a complete SKILL.md, ready to deliver.

---

## Phase 2 Findings

### 2a — Mental Simulation

Representative request: "Create a skill that reviews pull requests for security issues."

**Stuck point — Phase 0 (fetch docs):** The phase launches a Task with
`subagent_type=claude-code-guide` to check for new frontmatter options. If the subagent
fails or returns nothing new, the phase continues. But there is no instruction for what to
do if the subagent returns conflicting information (e.g., a field has changed its valid
values). Claude has to guess.

**Divergence point — Phase 4 (self-evaluation):** The skill instructs Claude to score the
generated skill on five dimensions against a "9.0/10.0" target. Two different Claudes will
score the same skill differently — the criteria (Clarity, Precision, Efficiency) are named
but not defined. One Claude might score Efficiency as 8/10, another as 7/10, and arrive at
different decisions about whether to refine.

**Dead end — post-delivery:** After the skill is delivered, the user has a SKILL.md but no
path to quality assurance. `repair-skill` exists for this, but `create-skill` never
mentions it. The user who doesn't know about `repair-skill` misses a natural next step.

### 2b — Live Doc Validation

All frontmatter field names verified against current docs: ✓

No drift found in this example.

### 2c — Feature Adjacency Scan

**Adjacent (high value):** After generating a skill, suggest running `repair-skill` for a
structural audit. Users who generate skills rarely know to do this separately, and repair
often catches things the generation phase missed. This is a one-sentence addition at the
end of Phase 3. Implementation effort: minimal.

**Complementary (medium value):** Before generating, help the user decide whether they need
a skill vs a command vs a custom agent. The skill currently assumes the user knows the right
artifact type. A one-question disambiguation in Phase 1 ("Is this something you'd trigger
automatically, or invoke manually with /?") would prevent the wrong artifact being generated.
Implementation effort: one AskUserQuestion in Phase 1.

### 2d — UX Flow Review

**Friction:** The Phase 1 interview asks five open-ended questions (Primary objective,
Trigger scenarios, Inputs/outputs, Complexity, Execution needs). A user who doesn't know
skill development best practices doesn't know what a "good" answer looks like. The
interview gathers information but doesn't guide it — a user who says "complexity: high"
because their problem feels hard might not realize how this maps to frontmatter choices.

**Consequential silent decision:** In Phase 2 Step 5 (delegation check), the skill
instructs Claude to scan for existing resources before finalizing. But if an existing skill
partially covers the requirement, Claude decides silently whether to extend it or create a
new one. This is a significant decision the user would want to weigh in on.

---

## Phase 3 Report

```
SKILL EFFECTIVENESS REPORT: create-skill

NEW FEATURES (capabilities the skill should have but doesn't)
──────────────────────────────────────────────────────────────
HIGH VALUE
  [2c] No post-delivery handoff to repair-skill — users who don't know it exists miss a
       natural quality-assurance step. The skill creates but never validates its output.
       Fix: Add one sentence at the end of Phase 3: "After delivery, suggest running
       repair-skill on the generated skill for a structural quality check."

MEDIUM VALUE
  [2c] No artifact-type disambiguation — skill assumes the user wants a skill, not a
       command or custom agent. Wrong artifact type requires starting over.

What is this skill?

Multi-phase analysis format: purpose statement, mental simulation, stuck/divergence/dead-end findings

Calibrates depth using sample runs (e.g., create-skill) so reviews stay consistent

Surfaces undefined scoring criteria and ambiguous subagent doc-fetch behavior

Identifies missing post-delivery paths such as repair-skill after create-skill

Eval-oriented: compares expected agent behavior against representative user requests

Adoption & trust: 1 installs on skills.sh; 253 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildAgent skills & templates

SKILL.md

READMESKILL.md - Improve Skill

# Sample Analysis: create-skill

This is a complete example of an improve-skill effectiveness analysis run against
`create-skill`. Use it to calibrate the expected depth and format.

---

## Skill Purpose (stated in Phase 1)

`create-skill` generates new Claude Code skills or commands from scratch. A user provides
requirements and receives a complete SKILL.md, ready to deliver.

---

## Phase 2 Findings

### 2a — Mental Simulation

Representative request: "Create a skill that reviews pull requests for security issues."

**Stuck point — Phase 0 (fetch docs):** The phase launches a Task with
`subagent_type=claude-code-guide` to check for new frontmatter options. If the subagent
fails or returns nothing new, the phase continues. But there is no instruction for what to
do if the subagent returns conflicting information (e.g., a field has changed its valid
values). Claude has to guess.

**Divergence point — Phase 4 (self-evaluation):** The skill instructs Claude to score the
generated skill on five dimensions against a "9.0/10.0" target. Two different Claudes will
score the same skill differently — the criteria (Clarity, Precision, Efficiency) are named
but not defined. One Claude might score Efficiency as 8/10, another as 7/10, and arrive at
different decisions about whether to refine.

**Dead end — post-delivery:** After the skill is delivered, the user has a SKILL.md but no
path to quality assurance. `repair-skill` exists for this, but `create-skill` never
mentions it. The user who doesn't know about `repair-skill` misses a natural next step.

### 2b — Live Doc Validation

All frontmatter field names verified against current docs: ✓

No drift found in this example.

### 2c — Feature Adjacency Scan

**Adjacent (high value):** After generating a skill, suggest running `repair-skill` for a
structural audit. Users who generate skills rarely know to do this separately, and repair
often catches things the generation phase missed. This is a one-sentence addition at the
end of Phase 3. Implementation effort: minimal.

**Complementary (medium value):** Before generating, help the user decide whether they need
a skill vs a command vs a custom agent. The skill currently assumes the user knows the right
artifact type. A one-question disambiguation in Phase 1 ("Is this something you'd trigger
automatically, or invoke manually with /?") would prevent the wrong artifact being generated.
Implementation effort: one AskUserQuestion in Phase 1.

### 2d — UX Flow Review

**Friction:** The Phase 1 interview asks five open-ended questions (Primary objective,
Trigger scenarios, Inputs/outputs, Complexity, Execution needs). A user who doesn't know
skill development best practices doesn't know what a "good" answer looks like. The
interview gathers information but doesn't guide it — a user who says "complexity: high"
because their problem feels hard might not realize how this maps to frontmatter choices.

**Consequential silent decision:** In Phase 2 Step 5 (delegation check), the skill
instructs Claude to scan for existing resources before finalizing. But if an existing skill
partially covers the requirement, Claude decides silently whether to extend it or create a
new one. This is a significant decision the user would want to weigh in on.

---

## Phase 3 Report

```
SKILL EFFECTIVENESS REPORT: create-skill

NEW FEATURES (capabilities the skill should have but doesn't)
──────────────────────────────────────────────────────────────
HIGH VALUE
  [2c] No post-delivery handoff to repair-skill — users who don't know it exists miss a
       natural quality-assurance step. The skill creates but never validates its output.
       Fix: Add one sentence at the end of Phase 3: "After delivery, suggest running
       repair-skill on the generated skill for a structural quality check."

MEDIUM VALUE
  [2c] No artifact-type disambiguation — skill assumes the user wants a skill, not a
       command or custom agent. Wrong artifact type requires starting over.

Install

What is this skill?

Recommended Skills

Journey fit

Is Improve Skill safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Improve Skill safe to install?

SKILL.md