Skills Eval

Name: Skills Eval
Author: athola

athola/claude-night-market

Benchmark how well an agent skill loads, discovers tools, and preserves context before you ship it to users.

Install

npx skills add https://github.com/athola/claude-night-market --skill skills-eval

What is this skill?

tool-performance-analyzer scripts with discovery, programmatic-calling, and parallel-analysis focus flags
discovery-optimizer benchmarks against MCP standards for loading patterns
Tracks loading efficiency, keyword matching, contextual tool loading, and tool cache behavior
Assesses sequential vs parallel multi-step tool workflows and error recovery
Context preservation analysis for context window utilization

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Primary fit

Skill quality gates belong on the ship shelf as review—where you validate agent workflows before publication. Review subphase fits eval scripts that score discoverability, programmatic calling, and context window use.

Common Questions / FAQ

Is Skills Eval safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Skills Eval

# Advanced Tool Use Analysis

## Dynamic Discovery Evaluation

### Tool Discovery Patterns
```bash
# Analyze tool discovery patterns and efficiency
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus discovery

# Benchmark against optimal loading patterns
skills/skills-eval/scripts/discovery-optimizer --skill-path skill.md --benchmark mcp-standards
```

### Discovery Optimization Targets
- **Loading Efficiency**: Minimize tool discovery latency
- **Pattern Recognition**: Optimize keyword matching and categorization
- **Contextual Loading**: Load tools based on relevance to current context
- **Memory Management**: Efficient tool caching and retrieval

## Programmatic Calling Assessment

### Multi-Step Workflow Analysis
```bash
# Evaluate multi-step workflow optimization opportunities
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus programmatic-calling

# Identify parallel execution opportunities
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --parallel-analysis
```

### Calling Optimization Metrics
- **Sequential Efficiency**: Optimize ordered tool execution
- **Parallel Processing**: Identify concurrent tool opportunities
- **Context Preservation**: Minimize context loss between calls
- **Error Recovery**: production-grade error handling and retry mechanisms

## Context Preservation Analysis

### Context Window Utilization
```bash
# Measure context window utilization efficiency
skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --context-analysis

# Identify pollution reduction opportunities
skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --pollution-analysis
```

### Optimization Strategies
- **Efficient Token Usage**: Maximize information density
- **Pollution Reduction**: Minimize irrelevant context accumulation
- **Window Management**: Strategic context window allocation
- **Compression Techniques**: Intelligent content summarization

## Performance Benchmarking

### Evaluation Criteria
- **MCP Compliance**: Validation against Model Context Protocol standards
- **Accuracy Metrics**: Tool discovery and execution accuracy improvements
- **Token Efficiency**: Usage patterns and optimization opportunities
- **Latency Analysis**: Multi-step workflow performance bottlenecks

### Target Improvements
- **Token Usage Reduction**: Aim for 37% reduction through programmatic calling optimization
- **Accuracy Improvements**: Target 25% improvement in tool discovery and execution
- **Context Optimization**: Maintain 95% context window preservation
- **Latency Reduction**: Eliminate multiple inference passes in complex workflows

## Advanced Analysis Techniques

### Comparative Analysis
```bash
# Benchmark against best-in-class examples
skills/skills-eval/scripts/performance-comparator --skill-path skill.md --baseline industry-standards

# Trend tracking over time
skills/skills-eval/scripts/performance-tracker --skill-path skill.md --metrics discovery,calling,context
```

### Optimization Recommendations
1. **Tool Grouping**: Related tools should be discoverable together
2. **Progressive Loading**: Load essential tools first, advanced tools later
3. **Context Caching**: Preserve relevant context between tool calls
4. **Error Patterns**: Analyze and optimize common error scenarios


# Skill Authoring Checklist

Quick-reference validation checklist for skill authors.

## Pre-Development

- [ ] Identified repeated task (done 5+ times, will do 10+ more)
- [ ] Confirmed no existing skill covers this
- [ ] Defined skill type (Technique, Pattern, or Reference)
- [ ] Chosen descriptive gerund-form name

## Frontmatter Validation

- [ ] `name`: ≤64 characters
- [ ] `name`: lowercase letters, numbers, hyphens only
- [ ] `name`: no reserved words (anthropic, claude)
- [ ] `description`: non-empty
- [ ] `description`: ≤1024 characters
- [ ] `description`: third person voice
- [ ] `description`: includes WHAT and WHEN

What is this skill?

tool-performance-analyzer scripts with discovery, programmatic-calling, and parallel-analysis focus flags

discovery-optimizer benchmarks against MCP standards for loading patterns

Tracks loading efficiency, keyword matching, contextual tool loading, and tool cache behavior

Assesses sequential vs parallel multi-step tool workflows and error recovery

Context preservation analysis for context window utilization

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

SKILL.md

READMESKILL.md - Skills Eval

# Advanced Tool Use Analysis

## Dynamic Discovery Evaluation

### Tool Discovery Patterns
```bash
# Analyze tool discovery patterns and efficiency
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus discovery

# Benchmark against optimal loading patterns
skills/skills-eval/scripts/discovery-optimizer --skill-path skill.md --benchmark mcp-standards
```

### Discovery Optimization Targets
- **Loading Efficiency**: Minimize tool discovery latency
- **Pattern Recognition**: Optimize keyword matching and categorization
- **Contextual Loading**: Load tools based on relevance to current context
- **Memory Management**: Efficient tool caching and retrieval

## Programmatic Calling Assessment

### Multi-Step Workflow Analysis
```bash
# Evaluate multi-step workflow optimization opportunities
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --focus programmatic-calling

# Identify parallel execution opportunities
skills/skills-eval/scripts/tool-performance-analyzer --skill-path skill.md --parallel-analysis
```

### Calling Optimization Metrics
- **Sequential Efficiency**: Optimize ordered tool execution
- **Parallel Processing**: Identify concurrent tool opportunities
- **Context Preservation**: Minimize context loss between calls
- **Error Recovery**: production-grade error handling and retry mechanisms

## Context Preservation Analysis

### Context Window Utilization
```bash
# Measure context window utilization efficiency
skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --context-analysis

# Identify pollution reduction opportunities
skills/skills-eval/scripts/token-usage-tracker --skill-path skill.md --pollution-analysis
```

### Optimization Strategies
- **Efficient Token Usage**: Maximize information density
- **Pollution Reduction**: Minimize irrelevant context accumulation
- **Window Management**: Strategic context window allocation
- **Compression Techniques**: Intelligent content summarization

## Performance Benchmarking

### Evaluation Criteria
- **MCP Compliance**: Validation against Model Context Protocol standards
- **Accuracy Metrics**: Tool discovery and execution accuracy improvements
- **Token Efficiency**: Usage patterns and optimization opportunities
- **Latency Analysis**: Multi-step workflow performance bottlenecks

### Target Improvements
- **Token Usage Reduction**: Aim for 37% reduction through programmatic calling optimization
- **Accuracy Improvements**: Target 25% improvement in tool discovery and execution
- **Context Optimization**: Maintain 95% context window preservation
- **Latency Reduction**: Eliminate multiple inference passes in complex workflows

## Advanced Analysis Techniques

### Comparative Analysis
```bash
# Benchmark against best-in-class examples
skills/skills-eval/scripts/performance-comparator --skill-path skill.md --baseline industry-standards

# Trend tracking over time
skills/skills-eval/scripts/performance-tracker --skill-path skill.md --metrics discovery,calling,context
```

### Optimization Recommendations
1. **Tool Grouping**: Related tools should be discoverable together
2. **Progressive Loading**: Load essential tools first, advanced tools later
3. **Context Caching**: Preserve relevant context between tool calls
4. **Error Patterns**: Analyze and optimize common error scenarios


# Skill Authoring Checklist

Quick-reference validation checklist for skill authors.

## Pre-Development

- [ ] Identified repeated task (done 5+ times, will do 10+ more)
- [ ] Confirmed no existing skill covers this
- [ ] Defined skill type (Technique, Pattern, or Reference)
- [ ] Chosen descriptive gerund-form name

## Frontmatter Validation

- [ ] `name`: ≤64 characters
- [ ] `name`: lowercase letters, numbers, hyphens only
- [ ] `name`: no reserved words (anthropic, claude)
- [ ] `description`: non-empty
- [ ] `description`: ≤1024 characters
- [ ] `description`: third person voice
- [ ] `description`: includes WHAT and WHEN

Install

What is this skill?

Recommended Skills

Journey fit

Is Skills Eval safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Skills Eval safe to install?

SKILL.md