
Skill Judge
Score a SKILL.md or skill package for knowledge delta, structure, and activation so you publish skills that add expert value—not redundant tutorials.
Overview
Skill Judge is an agent skill most often used in Build (also Validate, Launch) that scores SKILL.md quality and knowledge delta with actionable fixes before you publish or list a skill.
Install
npx skills add https://github.com/softaworks/agent-toolkit --skill skill-judgeWhat is this skill?
- Multi-dimensional scoring against official skill specs and best practices
- Knowledge-delta lens: expert-only content minus what the model already knows
- Flags token-wasting redundancy and structural activation blockers
- Actionable improvement suggestions for SKILL.md and skill packages
- Supports compare-and-audit workflows before publishing to a catalog
- Core formula: Good Skill = Expert-only Knowledge minus What Claude Already Knows
Adoption & trust: 3.8k installs on skills.sh; 2k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You cannot tell whether your SKILL.md adds real expert value or just repeats common knowledge while hurting activation and token budget.
Who is it for?
Authors reviewing a skill before publishing, auditing downloaded skills.sh packages, or comparing two skills with the same stated purpose.
Skip if: Running application security pentests or judging production runtime bugs—this evaluates skill documentation and package design only.
When should I use this skill?
Evaluate this skill, review my SKILL.md, audit or score a skill, compare skills, or get actionable improvements before publishing.
What do I get? / Deliverables
You receive structured scores and specific edits so the skill keeps high-signal expert content, clearer triggers, and packaging that agents can activate reliably.
- Multi-dimensional quality scores
- Actionable improvement list for SKILL.md and package structure
- Knowledge-delta assessment (value vs redundancy)
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Primary shelf is Build while authoring or curating agent skills; the same audit applies when polishing skills before Ship or sharing them at Launch. Agent-tooling covers SKILL.md quality, packaging, and evaluation—not running production monitoring or growth experiments.
Where it fits
Score a draft SKILL.md for trigger clarity and cut tutorial paragraphs the model already knows.
Decide whether to adopt a marketplace skill by measuring expert-only content versus token waste.
Polish listing-ready skills with consistent rubric scores before publishing to a public catalog.
How it compares
A SKILL.md quality auditor with scoring rubrics, not a code review or MCP server capability check.
Common Questions / FAQ
Who is skill-judge for?
Solo builders and indie teams who write, fork, or catalog agent skills and want consistent quality review before shipping them to teammates or directories like Prism.
When should I use skill-judge?
Use it in Build while drafting SKILL.md, in Validate when deciding whether to adopt a third-party skill, and at Launch before listing a skill publicly—on phrases like “evaluate this skill,” “audit this skill,” or “how can I improve this skill.”
Is skill-judge safe to install?
It reads and critiques skill files in your workspace; it does not assert install safety—review the Security Audits panel on this Prism page for the originating package.
SKILL.md
READMESKILL.md - Skill Judge
# Skill Judge A comprehensive evaluation framework for assessing Agent Skill quality against official specifications and best practices. This skill provides multi-dimensional scoring and actionable improvement suggestions for SKILL.md files and skill packages. ## Purpose Skill Judge exists to solve a critical problem: **most Skills waste tokens on knowledge Claude already has**. The skill helps you evaluate whether a Skill actually adds value by measuring its "knowledge delta" - the gap between what the Skill provides and what Claude already knows. A good Skill should be a compressed expert brain, not a tutorial. ### The Core Formula > **Good Skill = Expert-only Knowledge - What Claude Already Knows** This skill helps you identify: - Token-wasting redundant content (things Claude already knows) - Genuine expert knowledge that adds value - Structural issues that prevent Skills from being activated or used effectively ## When to Use Use Skill Judge when you need to: - **Review a Skill before publishing**: Evaluate quality and identify improvements - **Audit existing Skills**: Systematic assessment against best practices - **Improve a SKILL.md file**: Get specific, actionable suggestions - **Learn Skill design patterns**: Understand what makes a great Skill - **Compare Skills**: Assess relative quality using consistent criteria **Trigger phrases**: - "Evaluate this skill" - "Review my SKILL.md" - "Audit this skill" - "Score this skill" - "How can I improve this skill?" - "Is this skill well-designed?" ## How It Works ### Evaluation Protocol 1. **First Pass - Knowledge Delta Scan**: Read the SKILL.md and categorize each section as: - **[E] Expert**: Claude genuinely doesn't know this (value-add) - **[A] Activation**: Claude knows but brief reminder is useful (acceptable) - **[R] Redundant**: Claude definitely knows this (should delete) 2. **Structure Analysis**: Check frontmatter validity, line count, reference files, design pattern, and loading triggers 3. **Score Each Dimension**: Evaluate against 8 dimensions with specific evidence and justifications 4. **Calculate Total and Grade**: Sum scores (max 120 points) and assign grade 5. **Generate Report**: Produce structured report with scores, critical issues, and improvements ### The 8 Evaluation Dimensions (120 points total) | Dimension | Max Points | What It Measures | |-----------|------------|------------------| | **D1: Knowledge Delta** | 20 | Does the Skill add genuine expert knowledge? (THE CORE DIMENSION) | | **D2: Mindset + Procedures** | 15 | Does it transfer expert thinking patterns and domain-specific workflows? | | **D3: Anti-Pattern Quality** | 15 | Does it have effective NEVER lists with specific reasons? | | **D4: Specification Compliance** | 15 | Is the frontmatter valid? Is the description comprehensive? | | **D5: Progressive Disclosure** | 15 | Is content properly layered for on-demand loading? | | **D6: Freedom Calibration** | 15 | Is specificity appropriate for task fragility? | | **D7: Pattern Recognition** | 10 | Does it follow an established official pattern? | | **D8: Practical Usability** | 15 | Can an Agent actually use this Skill effectively? | ### Grading Scale | Grade | Percentage | Meaning | |-------|------------|---------| | A | 90%+ (108+) | Excellent - production-ready expert Skill | | B | 80-89% (96-107) | Good - minor improvements needed | | C | 70-79% (84-95) | Adequate - clear improvement path | | D | 60-69% (72-83) | Below Average - significant issues | | F | <60% (<72) | Poor - needs fundamental redesign | ## Key Features ### Knowledge Classification System The skill teaches you to recognize three types of content: | Type | Definition | Treatment | |------|------------|-----------| | **Expert** | Claude genuinely doesn't know this | Must keep - this is the Skill's value | | **Activation** | Claude knows but may not think of | Keep if brief - serves as reminder | | **Redundant** | Claude definitely knows thi