Skill Test

Name: Skill Test
Author: microsoft

microsoft/skills-for-fabric

Manage the skills-for-fabric eval framework—add eval plans, list results, generate datasets, check coverage, and route runs to the tests/ folder.

Overview

skill-test is an agent skill most often used in Ship (also Build/agent-tooling) that operates the skills-for-fabric evaluation framework from intent to tests/ execution.

Install

npx skills add https://github.com/microsoft/skills-for-fabric --skill skill-test

What is this skill?

Intent routing table maps phrases (add evals, list tests, run tests) to dedicated workflows
Supports eval plans for new or existing skills in the skills-for-fabric repo
Coverage workflow to find skills missing tests
Directs test execution to the repository tests/ folder
Dataset generation and metrics review for ongoing skill quality

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 26 installs on skills.sh; 427 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You maintain Fabric agent skills but lack a consistent way to add evals, see results, or know which skills have zero test coverage.

Who is it for?

skills-for-fabric contributors shipping or updating Fabric skills who want eval-driven quality gates.

Skip if: Consumers who only invoke Fabric runtime skills in production, or repos outside skills-for-fabric without the eval layout.

When should I use this skill?

Triggers include add tests, add evals, list tests, show eval results, run tests, generate eval data, eval metrics, test coverage, missing tests, show tests.

What do I get? / Deliverables

Eval plans, listings, datasets, and coverage reports align with repo conventions and runs land in tests/ instead of ad-hoc agent scripts.

Eval plan for a skill
Test listing or results summary
Coverage gap report

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Canonical shelf is Ship/testing because the skill’s primary job is evaluation, metrics, and executing the test suite for Fabric skills. Testing subphase matches add evals, list tests, run tests, eval metrics, and coverage gaps explicitly named in triggers.

Also useful

BuildAgent skills & templates

Where it fits

Example use

BuildAgent skills & templates

After authoring a new Fabric skill, route add evals to scaffold an eval plan before merge.

Example use

ShipTesting & QA

Before release, run tests via the skill so agents execute the suite under tests/.

Example use

OperateIteration & experiments

Review eval metrics when production-like prompts regress after a skill wording change.

How it compares

Repo-local eval orchestration for skill authors—not a substitute for unit testing application code in your product.

Common Questions / FAQ

Who is skill-test for?

Solo and indie maintainers contributing to Microsoft skills-for-fabric who need to add, list, run, or measure skill evaluations.

When should I use skill-test?

Use it in Ship/testing when adding evals or checking results; in Build/agent-tooling when scaffolding tests for a skill you are authoring; when someone says run tests or test coverage.

Is skill-test safe to install?

It may invoke local test runners and read repo files—review the Security Audits panel on this page before granting shell and filesystem access.

SKILL.md

READMESKILL.md - Skill Test

# Skill Test — skills-for-fabric Evaluation Framework

Manage the end-to-end evaluation framework for skills-for-fabric. This skill routes requests to the correct workflow based on user intent — adding tests, listing tests, running tests, viewing results, generating data, or checking coverage.

## When to Use

- When a contributor wants to add evaluation test cases for a new or existing skill
- When someone asks to see what tests exist or what results look like
- When a user wants to run the test suite
- When reviewing eval metrics or checking which skills lack test coverage

## Intent Routing

Parse the user request and route to the appropriate workflow:

| User Intent | Trigger Phrases | Action |
|-------------|----------------|--------|
| **Add evals** | "add tests", "add evals", "add evals for missing skills", "create eval plan" | → [Workflow: Add Evals](#workflow-add-evals) |
| **List tests** | "list tests", "list evals", "show me the list of tests", "what tests exist", "show eval plans" | → [Workflow: List Tests](#workflow-list-tests) |
| **Run tests** | "run tests", "run evals", "execute tests", "run the eval suite" | → [Workflow: Run Tests](#workflow-run-tests) |
| **View results** | "show eval results", "test results", "eval results", "executive summary" | → [Workflow: View Results](#workflow-view-results) |
| **Generate data** | "generate eval data", "generate test data", "create eval datasets" | → [Workflow: Generate Data](#workflow-generate-data) |
| **View metrics** | "eval metrics", "test metrics", "what metrics", "how are tests scored" | → [Workflow: View Metrics](#workflow-view-metrics) |
| **Check coverage** | "test coverage", "which skills have tests", "missing tests", "skills without evals" | → [Workflow: Check Coverage](#workflow-check-coverage) |

---

## Workflow: Add Evals

Follow the instructions in `tests/full-eval-tests/README.md` § "Adding Evals for New Skills".

### Automated Path (Recommended)

Give the agent the prompt:

```
Add evals for the missing skills
```

The agent will:
1. Detect missing skills by comparing installed skills against existing eval plans in `tests/full-eval-tests/plan/03-individual-skills/`
2. Generate individual eval plans (`plan/03-individual-skills/eval-<skill-name>.md`) with 10–12 test cases
3. Generate combined eval plans (`plan/04-combined-skills/eval-<skill>-authoring-plus-consumption.md`)
4. Create golden data in `tests/full-eval-tests/evalsets/expected-results/`
5. Update tracking files: `plan/00-overview.md`, `README.md`, `plan/04-combined-skills/eval-full-pipeline.md`

### Manual Path

To add evals for a specific skill `<new-skill>`:

1. Create `tests/full-eval-tests/plan/03-individual-skills/eval-<new-skill>.md` using the template in the README
2. Each test case needs: Case ID (unique prefix), Prompt, Expected result, Pass criteria, at least one negative/ambiguous test
3. If the skill has an authoring+consumption pair, create `tests/full-eval-tests/plan/04-combined-skills/eval-<new-skill>-authoring-plus-consumption.md`
4. Add golden data to `tests/full-eval-tests/evalsets/expected-results/`
5. Update `plan/00-overview.md`, `README.md` directory tree, and `plan/04-combined-skills/eval-full-pipeline.md`

### Eval Plan Template

Use the template from `tests/full-eval-tests/README.md` § "Eval Plan Template". Every eval plan must include:
- Skill overview (name, category, R/W, purpose)
- Pre-requisites
- Numbered test cases (XX-01 through XX-10+) with Prompt / Expected / Pass criteria
- At least one negative/ambig

What is this skill?

Intent routing table maps phrases (add evals, list tests, run tests) to dedicated workflows

Supports eval plans for new or existing skills in the skills-for-fabric repo

Coverage workflow to find skills missing tests

Directs test execution to the repository tests/ folder

Dataset generation and metrics review for ongoing skill quality

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 26 installs on skills.sh; 427 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

BuildAgent skills & templates

Where it fits

Example use

BuildAgent skills & templates

After authoring a new Fabric skill, route add evals to scaffold an eval plan before merge.

Example use

ShipTesting & QA

Before release, run tests via the skill so agents execute the suite under tests/.

Example use

OperateIteration & experiments

Review eval metrics when production-like prompts regress after a skill wording change.

SKILL.md

READMESKILL.md - Skill Test

# Skill Test — skills-for-fabric Evaluation Framework

Manage the end-to-end evaluation framework for skills-for-fabric. This skill routes requests to the correct workflow based on user intent — adding tests, listing tests, running tests, viewing results, generating data, or checking coverage.

## When to Use

- When a contributor wants to add evaluation test cases for a new or existing skill
- When someone asks to see what tests exist or what results look like
- When a user wants to run the test suite
- When reviewing eval metrics or checking which skills lack test coverage

## Intent Routing

Parse the user request and route to the appropriate workflow:

| User Intent | Trigger Phrases | Action |
|-------------|----------------|--------|
| **Add evals** | "add tests", "add evals", "add evals for missing skills", "create eval plan" | → [Workflow: Add Evals](#workflow-add-evals) |
| **List tests** | "list tests", "list evals", "show me the list of tests", "what tests exist", "show eval plans" | → [Workflow: List Tests](#workflow-list-tests) |
| **Run tests** | "run tests", "run evals", "execute tests", "run the eval suite" | → [Workflow: Run Tests](#workflow-run-tests) |
| **View results** | "show eval results", "test results", "eval results", "executive summary" | → [Workflow: View Results](#workflow-view-results) |
| **Generate data** | "generate eval data", "generate test data", "create eval datasets" | → [Workflow: Generate Data](#workflow-generate-data) |
| **View metrics** | "eval metrics", "test metrics", "what metrics", "how are tests scored" | → [Workflow: View Metrics](#workflow-view-metrics) |
| **Check coverage** | "test coverage", "which skills have tests", "missing tests", "skills without evals" | → [Workflow: Check Coverage](#workflow-check-coverage) |

---

## Workflow: Add Evals

Follow the instructions in `tests/full-eval-tests/README.md` § "Adding Evals for New Skills".

### Automated Path (Recommended)

Give the agent the prompt:

```
Add evals for the missing skills
```

The agent will:
1. Detect missing skills by comparing installed skills against existing eval plans in `tests/full-eval-tests/plan/03-individual-skills/`
2. Generate individual eval plans (`plan/03-individual-skills/eval-<skill-name>.md`) with 10–12 test cases
3. Generate combined eval plans (`plan/04-combined-skills/eval-<skill>-authoring-plus-consumption.md`)
4. Create golden data in `tests/full-eval-tests/evalsets/expected-results/`
5. Update tracking files: `plan/00-overview.md`, `README.md`, `plan/04-combined-skills/eval-full-pipeline.md`

### Manual Path

To add evals for a specific skill `<new-skill>`:

1. Create `tests/full-eval-tests/plan/03-individual-skills/eval-<new-skill>.md` using the template in the README
2. Each test case needs: Case ID (unique prefix), Prompt, Expected result, Pass criteria, at least one negative/ambiguous test
3. If the skill has an authoring+consumption pair, create `tests/full-eval-tests/plan/04-combined-skills/eval-<new-skill>-authoring-plus-consumption.md`
4. Add golden data to `tests/full-eval-tests/evalsets/expected-results/`
5. Update `plan/00-overview.md`, `README.md` directory tree, and `plan/04-combined-skills/eval-full-pipeline.md`

### Eval Plan Template

Use the template from `tests/full-eval-tests/README.md` § "Eval Plan Template". Every eval plan must include:
- Skill overview (name, category, R/W, purpose)
- Pre-requisites
- Numbered test cases (XX-01 through XX-10+) with Prompt / Expected / Pass criteria
- At least one negative/ambig

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is skill-test for?

When should I use skill-test?

Is skill-test safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is skill-test for?

When should I use skill-test?

Is skill-test safe to install?

SKILL.md