
Skill Test
Manage the skills-for-fabric eval framework—add eval plans, list results, generate datasets, check coverage, and route runs to the tests/ folder.
Overview
skill-test is an agent skill most often used in Ship (also Build/agent-tooling) that operates the skills-for-fabric evaluation framework from intent to tests/ execution.
Install
npx skills add https://github.com/microsoft/skills-for-fabric --skill skill-testWhat is this skill?
- Intent routing table maps phrases (add evals, list tests, run tests) to dedicated workflows
- Supports eval plans for new or existing skills in the skills-for-fabric repo
- Coverage workflow to find skills missing tests
- Directs test execution to the repository tests/ folder
- Dataset generation and metrics review for ongoing skill quality
Adoption & trust: 26 installs on skills.sh; 427 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You maintain Fabric agent skills but lack a consistent way to add evals, see results, or know which skills have zero test coverage.
Who is it for?
skills-for-fabric contributors shipping or updating Fabric skills who want eval-driven quality gates.
Skip if: Consumers who only invoke Fabric runtime skills in production, or repos outside skills-for-fabric without the eval layout.
When should I use this skill?
Triggers include add tests, add evals, list tests, show eval results, run tests, generate eval data, eval metrics, test coverage, missing tests, show tests.
What do I get? / Deliverables
Eval plans, listings, datasets, and coverage reports align with repo conventions and runs land in tests/ instead of ad-hoc agent scripts.
- Eval plan for a skill
- Test listing or results summary
- Coverage gap report
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Ship/testing because the skill’s primary job is evaluation, metrics, and executing the test suite for Fabric skills. Testing subphase matches add evals, list tests, run tests, eval metrics, and coverage gaps explicitly named in triggers.
Where it fits
After authoring a new Fabric skill, route add evals to scaffold an eval plan before merge.
Before release, run tests via the skill so agents execute the suite under tests/.
Review eval metrics when production-like prompts regress after a skill wording change.
How it compares
Repo-local eval orchestration for skill authors—not a substitute for unit testing application code in your product.
Common Questions / FAQ
Who is skill-test for?
Solo and indie maintainers contributing to Microsoft skills-for-fabric who need to add, list, run, or measure skill evaluations.
When should I use skill-test?
Use it in Ship/testing when adding evals or checking results; in Build/agent-tooling when scaffolding tests for a skill you are authoring; when someone says run tests or test coverage.
Is skill-test safe to install?
It may invoke local test runners and read repo files—review the Security Audits panel on this page before granting shell and filesystem access.
SKILL.md
READMESKILL.md - Skill Test
# Skill Test — skills-for-fabric Evaluation Framework Manage the end-to-end evaluation framework for skills-for-fabric. This skill routes requests to the correct workflow based on user intent — adding tests, listing tests, running tests, viewing results, generating data, or checking coverage. ## When to Use - When a contributor wants to add evaluation test cases for a new or existing skill - When someone asks to see what tests exist or what results look like - When a user wants to run the test suite - When reviewing eval metrics or checking which skills lack test coverage ## Intent Routing Parse the user request and route to the appropriate workflow: | User Intent | Trigger Phrases | Action | |-------------|----------------|--------| | **Add evals** | "add tests", "add evals", "add evals for missing skills", "create eval plan" | → [Workflow: Add Evals](#workflow-add-evals) | | **List tests** | "list tests", "list evals", "show me the list of tests", "what tests exist", "show eval plans" | → [Workflow: List Tests](#workflow-list-tests) | | **Run tests** | "run tests", "run evals", "execute tests", "run the eval suite" | → [Workflow: Run Tests](#workflow-run-tests) | | **View results** | "show eval results", "test results", "eval results", "executive summary" | → [Workflow: View Results](#workflow-view-results) | | **Generate data** | "generate eval data", "generate test data", "create eval datasets" | → [Workflow: Generate Data](#workflow-generate-data) | | **View metrics** | "eval metrics", "test metrics", "what metrics", "how are tests scored" | → [Workflow: View Metrics](#workflow-view-metrics) | | **Check coverage** | "test coverage", "which skills have tests", "missing tests", "skills without evals" | → [Workflow: Check Coverage](#workflow-check-coverage) | --- ## Workflow: Add Evals Follow the instructions in `tests/full-eval-tests/README.md` § "Adding Evals for New Skills". ### Automated Path (Recommended) Give the agent the prompt: ``` Add evals for the missing skills ``` The agent will: 1. Detect missing skills by comparing installed skills against existing eval plans in `tests/full-eval-tests/plan/03-individual-skills/` 2. Generate individual eval plans (`plan/03-individual-skills/eval-<skill-name>.md`) with 10–12 test cases 3. Generate combined eval plans (`plan/04-combined-skills/eval-<skill>-authoring-plus-consumption.md`) 4. Create golden data in `tests/full-eval-tests/evalsets/expected-results/` 5. Update tracking files: `plan/00-overview.md`, `README.md`, `plan/04-combined-skills/eval-full-pipeline.md` ### Manual Path To add evals for a specific skill `<new-skill>`: 1. Create `tests/full-eval-tests/plan/03-individual-skills/eval-<new-skill>.md` using the template in the README 2. Each test case needs: Case ID (unique prefix), Prompt, Expected result, Pass criteria, at least one negative/ambiguous test 3. If the skill has an authoring+consumption pair, create `tests/full-eval-tests/plan/04-combined-skills/eval-<new-skill>-authoring-plus-consumption.md` 4. Add golden data to `tests/full-eval-tests/evalsets/expected-results/` 5. Update `plan/00-overview.md`, `README.md` directory tree, and `plan/04-combined-skills/eval-full-pipeline.md` ### Eval Plan Template Use the template from `tests/full-eval-tests/README.md` § "Eval Plan Template". Every eval plan must include: - Skill overview (name, category, R/W, purpose) - Pre-requisites - Numbered test cases (XX-01 through XX-10+) with Prompt / Expected / Pass criteria - At least one negative/ambig