Hsb App

Name: Hsb App
Author: nvidia

nvidia/skills

Validate an NVIDIA `hsb-app` agent skill against NVSkills-Eval before you publish it to a broader workflow or catalog.

Overview

`hsb-app` is an agent skill for the Ship phase that provides an NVSkills-Eval–benchmarked `hsb-app` workflow agents can run after passing security, correctness, and discoverability checks.

Install

npx skills add https://github.com/nvidia/skills --skill hsb-app

What is this skill?

NVSkills-Eval external profile on local environment with 3 tasks and 2 attempts per task at a 50% pass threshold—overall
Benchmark dimensions: Security, Correctness, Discoverability, Effectiveness, and Efficiency (tokens and redundant work)
Validated agents: Claude Code and Codex
Signals cover unsafe ops, skill load fidelity, and with-skill vs without-skill performance
Pre-publication verdict suitable for trusting agent-assisted `hsb-app` workflows
3 evaluation tasks with 2 attempts per task
50% pass threshold; overall verdict PASS
Benchmarked on claude-code and codex

Compatible agents: Claude Code, Codex

Adoption & trust: 1 installs on skills.sh; 1.1k GitHub stars; trending (+100% hot-view momentum).

What problem does it solve?

You want to adopt an NVIDIA agent skill but cannot tell from marketing copy whether it is safe, discoverable, and actually better than ad-hoc prompting.

Who is it for?

Indie builders publishing or consuming NVIDIA agent skills who require eval-backed confidence before catalog or team rollout.

Skip if: Builders who need the full SKILL.md procedure in-repo—this Prism entry is centered on the evaluation report, not a tutorial app scaffold.

When should I use this skill?

You are adopting or publishing the NVIDIA `hsb-app` skill and need evaluation-backed assurance on security, correctness, discoverability, effectiveness, and efficiency.

What do I get? / Deliverables

You get a documented PASS benchmark across security, correctness, discoverability, effectiveness, and efficiency so you can ship the skill into Claude Code or Codex workflows with clearer risk posture.

Evaluation-informed decision to enable or skip the skill in agent workflows
Traceable PASS summary across five benchmark dimensions

Recommended Skills

Find Skillsvercel-labs/skills

Find Skills is a meta agent skill from the Vercel Labs skills package that helps solo builders discover and install modu…2M installs·21.7k stars

Skill Creatoranthropics/skills

Skill-creator is an Anthropic-originated meta skill aimed at solo and indie builders who want durable agent capabilities…258k installs·148k stars

Lark Skill Makerlarksuite/cli

Meta-skill for packaging Feishu/Lark API operations into installable lark-cli Skills.207k installs·13.7k stars

Skills Clixixu-me/skills

skills-cli is a procedural agent skill that teaches assistants how to operate the open Agent Skills CLI—the package mana…200k installs·61 stars

Write A Skillmattpocock/skills

End-to-end guide for authoring new agent skills with proper metadata, folder layout, progressive disclosure, and user va…181k installs·121k stars

Using Superpowersobra/superpowers

Using Superpowers is a journey-wide meta skill for solo and indie builders who run Claude Code, Codex, Cursor, or simila…134k installs·221k stars

Journey fit

Primary fit

Publication readiness and benchmark gates sit in Ship—after Build—when you prove safety, correctness, and discoverability before release. NVSkills-Eval’s 3-tier runs map directly to pre-release testing and quality assurance for agent skills.

Also useful

BuildAgent skills & templates

How it compares

Use as an eval-certified skill package rather than guessing quality from install rank alone.

Common Questions / FAQ

Who is hsb-app for?

Solo builders and small teams using Claude Code or Codex who want NVIDIA `hsb-app` skills vetted through NVSkills-Eval before they rely on them in daily agent workflows.

When should I use hsb-app?

During Ship/testing when you are choosing which NVIDIA skills to enable, comparing agent behavior with the skill loaded versus baseline, or documenting publication readiness for a skill you maintain.

Is hsb-app safe to install?

The bundled evaluation run reported a PASS on security among other dimensions; still review the Security Audits panel on this Prism page and your org policies before granting shell, network, or secrets access.

SKILL.md

READMESKILL.md - Hsb App

# Evaluation Report

Evaluation of the `hsb-app` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

## Evaluation Summary

- Skill: `hsb-app`
- Evaluation date: 2026-05-30
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 3 evaluation tasks
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: PASS

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 3 evaluation tasks:

- Positive tasks: 3 tasks where the skill was expected to activate.
- Negative tasks: 0 tasks where no skill was expected.
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.

Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.

## Results

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 6 | 100% (+17%) | 100% (+17%) |
| Correctness | 6 | 95% (+0%) | 84% (+41%) |
| Discoverability | 6 | 73% (-1%) | 69% (+16%) |
| Effectiveness | 6 | 88% (+4%) | 76% (+66%) |
| Efficiency | 6 | 59% (+0%) | 60% (+22%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 7 total findings.

Top findings:

- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_discoverability: Description very long (267 chars, recommend 50-150) (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_discoverability: No '## Purpose' section (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.

Notable observations:

- Context Deduplication: Collected 2 file(s)
- Inter-Skill Deduplication: Parsed s

What is this skill?

NVSkills-Eval external profile on local environment with 3 tasks and 2 attempts per task at a 50% pass threshold—overall

Benchmark dimensions: Security, Correctness, Discoverability, Effectiveness, and Efficiency (tokens and redundant work)

Validated agents: Claude Code and Codex

Signals cover unsafe ops, skill load fidelity, and with-skill vs without-skill performance

Pre-publication verdict suitable for trusting agent-assisted `hsb-app` workflows

3 evaluation tasks with 2 attempts per task

50% pass threshold; overall verdict PASS

Benchmarked on claude-code and codex

Compatible agents: Claude Code, Codex

Adoption & trust: 1 installs on skills.sh; 1.1k GitHub stars; trending (+100% hot-view momentum).

What do I get? / Deliverables

Evaluation-informed decision to enable or skip the skill in agent workflows

Traceable PASS summary across five benchmark dimensions

Journey fit

Primary fit

Also useful

BuildAgent skills & templates

SKILL.md

READMESKILL.md - Hsb App

# Evaluation Report

Evaluation of the `hsb-app` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

## Evaluation Summary

- Skill: `hsb-app`
- Evaluation date: 2026-05-30
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 3 evaluation tasks
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: PASS

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 3 evaluation tasks:

- Positive tasks: 3 tasks where the skill was expected to activate.
- Negative tasks: 0 tasks where no skill was expected.
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.

Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.

## Results

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 6 | 100% (+17%) | 100% (+17%) |
| Correctness | 6 | 95% (+0%) | 84% (+41%) |
| Discoverability | 6 | 73% (-1%) | 69% (+16%) |
| Effectiveness | 6 | 88% (+4%) | 76% (+66%) |
| Efficiency | 6 | 59% (+0%) | 60% (+22%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 7 total findings.

Top findings:

- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_discoverability: Description very long (267 chars, recommend 50-150) (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_discoverability: No '## Purpose' section (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)
- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`team-skills/holoscan/holoscan-sensor-bridge/hsb-app/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.

Notable observations:

- Context Deduplication: Collected 2 file(s)
- Inter-Skill Deduplication: Parsed s

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is hsb-app for?

When should I use hsb-app?

Is hsb-app safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is hsb-app for?

When should I use hsb-app?

Is hsb-app safe to install?

SKILL.md