
Skills Eval
Benchmark how well an agent skill loads, discovers tools, and preserves context before you ship it to users.
Install
npx skills add https://github.com/athola/claude-night-market --skill skills-evalWhat is this skill?
- tool-performance-analyzer scripts with discovery, programmatic-calling, and parallel-analysis focus flags
- discovery-optimizer benchmarks against MCP standards for loading patterns
- Tracks loading efficiency, keyword matching, contextual tool loading, and tool cache behavior
- Assesses sequential vs parallel multi-step tool workflows and error recovery
- Context preservation analysis for context window utilization
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Find Skillsvercel-labs/skills
Skill Creatoranthropics/skills
Lark Skill Makerlarksuite/cli
Skills Clixixu-me/skills
Write A Skillmattpocock/skills
Using Superpowersobra/superpowers
Journey fit
Common Questions / FAQ
Is Skills Eval safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Skills Eval
# Advanced Tool Use Analysis ## Dynamic Discovery Evaluation ### Tool Discovery Patterns ```bash # Analyze tool discovery patterns and efficiency skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus discovery # Benchmark against optimal loading patterns skills/skills-eval/scripts/discovery-optimizer --skill-path skill.md --benchmark mcp-standards ``` ### Discovery Optimization Targets - **Loading Efficiency**: Minimize tool discovery latency - **Pattern Recognition**: Optimize keyword matching and categorization - **Contextual Loading**: Load tools based on relevance to current context - **Memory Management**: Efficient tool caching and retrieval ## Programmatic Calling Assessment ### Multi-Step Workflow Analysis ```bash # Evaluate multi-step workflow optimization opportunities skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus programmatic-calling # Identify parallel execution opportunities skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --parallel-analysis ``` ### Calling Optimization Metrics - **Sequential Efficiency**: Optimize ordered tool execution - **Parallel Processing**: Identify concurrent tool opportunities - **Context Preservation**: Minimize context loss between calls - **Error Recovery**: production-grade error handling and retry mechanisms ## Context Preservation Analysis ### Context Window Utilization ```bash # Measure context window utilization efficiency skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --context-analysis # Identify pollution reduction opportunities skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --pollution-analysis ``` ### Optimization Strategies - **Efficient Token Usage**: Maximize information density - **Pollution Reduction**: Minimize irrelevant context accumulation - **Window Management**: Strategic context window allocation - **Compression Techniques**: Intelligent content summarization ## Performance Benchmarking ### Evaluation Criteria - **MCP Compliance**: Validation against Model Context Protocol standards - **Accuracy Metrics**: Tool discovery and execution accuracy improvements - **Token Efficiency**: Usage patterns and optimization opportunities - **Latency Analysis**: Multi-step workflow performance bottlenecks ### Target Improvements - **Token Usage Reduction**: Aim for 37% reduction through programmatic calling optimization - **Accuracy Improvements**: Target 25% improvement in tool discovery and execution - **Context Optimization**: Maintain 95% context window preservation - **Latency Reduction**: Eliminate multiple inference passes in complex workflows ## Advanced Analysis Techniques ### Comparative Analysis ```bash # Benchmark against best-in-class examples skills/skills-eval/scripts/performance-comparator --skill-path skill.md --baseline industry-standards # Trend tracking over time skills/skills-eval/scripts/performance-tracker --skill-path skill.md --metrics discovery,calling,context ``` ### Optimization Recommendations 1. **Tool Grouping**: Related tools should be discoverable together 2. **Progressive Loading**: Load essential tools first, advanced tools later 3. **Context Caching**: Preserve relevant context between tool calls 4. **Error Patterns**: Analyze and optimize common error scenarios # Skill Authoring Checklist Quick-reference validation checklist for skill authors. ## Pre-Development - [ ] Identified repeated task (done 5+ times, will do 10+ more) - [ ] Confirmed no existing skill covers this - [ ] Defined skill type (Technique, Pattern, or Reference) - [ ] Chosen descriptive gerund-form name ## Frontmatter Validation - [ ] `name`: ≤64 characters - [ ] `name`: lowercase letters, numbers, hyphens only - [ ] `name`: no reserved words (anthropic, claude) - [ ] `description`: non-empty - [ ] `description`: ≤1024 characters - [ ] `description`: third person voice - [ ] `description`: includes WHAT and WHEN