Nemo Mbridge Mlm Bridge Training

Name: Nemo Mbridge Mlm Bridge Training
Author: nvidia

nvidia/skills

Run NVIDIA NeMo mBridge MLM bridge training workflows with an agent-guided procedure validated for correctness and safety before you publish or reuse the skill.

Overview

Nemo-mbridge-mlm-bridge-training is an agent skill for the Build phase that guides NVIDIA NeMo mBridge MLM bridge training workflows with NVSkills-Eval–verified agent behavior.

Install

npx skills add https://github.com/nvidia/skills --skill nemo-mbridge-mlm-bridge-training

What is this skill?

NVSkills-Eval external profile benchmark with PASS verdict documented
3-Tier evaluation across security, correctness, discoverability, effectiveness, and efficiency
Benchmarked with claude-code and codex agents (2 attempts per task, 50% pass threshold)
Designed for agents that must load the skill only when MLM bridge training is relevant
NVSkills-Eval: 2 attempts per task, 50% pass threshold, overall PASS
Agents benchmarked: claude-code and codex

Compatible agents: Claude Code, Codex

Adoption & trust: 1 installs on skills.sh; 1.1k GitHub stars; trending (+100% hot-view momentum).

What problem does it solve?

You need to run NeMo mBridge MLM bridge training but lack a repeatable, agent-safe procedure agents can discover without hallucinating flags or skipping validation steps.

Who is it for?

Indie ML builders or small teams already on NVIDIA NeMo who delegate training setup to Claude Code or Codex and want an eval-backed skill.

Skip if: Builders with no GPU/NeMo environment who only need generic LLM API integration in a web app.

When should I use this skill?

When setting up or running NVIDIA NeMo mBridge MLM bridge training and the agent should follow the published NVSkills skill workflow.

What do I get? / Deliverables

After the skill runs, the agent follows the evaluated training workflow with documented pass criteria for security and correctness rather than improvised shell commands.

Configured bridge training run per skill workflow
Training artifacts or logs from the guided procedure

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildBackend, data & payments

Training and bridge configuration is core product/backend ML work during build, not distribution or production monitoring. mBridge MLM bridge training targets model training pipelines and bridge setup rather than UI or go-to-market assets.

How it compares

Specialized NeMo training skill package—not a generic Hugging Face fine-tuning tutorial or an MCP server.

Common Questions / FAQ

Who is nemo-mbridge-mlm-bridge-training for?

Solo builders and small teams training with NVIDIA NeMo mBridge who use coding agents for repeatable MLM bridge training setup.

When should I use nemo-mbridge-mlm-bridge-training?

Use it in the build phase when you are configuring or launching mBridge MLM bridge training and want the agent to load NVIDIA’s procedural skill instead of improvising.

Is nemo-mbridge-mlm-bridge-training safe to install?

Evaluation docs report security checks in NVSkills-Eval; still review the Security Audits panel on this Prism page and your cluster secrets policy before agent-driven training runs.

SKILL.md

READMESKILL.md - Nemo Mbridge Mlm Bridge Training

# Evaluation Report

Evaluation of the `nemo-mbridge-mlm-bridge-training` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

## Evaluation Summary

- Skill: `nemo-mbridge-mlm-bridge-training`
- Evaluation date: 2026-06-02
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 1 evaluation tasks
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: PASS

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 1 evaluation tasks:

- Positive tasks: 1 tasks where the skill was expected to activate.
- Negative tasks: 0 tasks where no skill was expected.
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.

Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.

## Results

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 2 | 100% (+0%) | 100% (+0%) |
| Correctness | 2 | 100% (+0%) | 88% (+0%) |
| Discoverability | 2 | 100% (+0%) | 62% (+0%) |
| Effectiveness | 2 | 100% (+0%) | 100% (+0%) |
| Efficiency | 2 | 93% (-0%) | 60% (-0%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 12 total findings.

Top findings:

- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.author' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.tags' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/author_missing: Author not specified in metadata (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.

Notable observations:

- Context Deduplication: Collected 1 file(s)
- Inter-Skill Deduplication

What is this skill?

NVSkills-Eval external profile benchmark with PASS verdict documented

3-Tier evaluation across security, correctness, discoverability, effectiveness, and efficiency

Benchmarked with claude-code and codex agents (2 attempts per task, 50% pass threshold)

Designed for agents that must load the skill only when MLM bridge training is relevant

NVSkills-Eval: 2 attempts per task, 50% pass threshold, overall PASS

Agents benchmarked: claude-code and codex

Compatible agents: Claude Code, Codex

Adoption & trust: 1 installs on skills.sh; 1.1k GitHub stars; trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildBackend, data & payments

SKILL.md

READMESKILL.md - Nemo Mbridge Mlm Bridge Training

# Evaluation Report

Evaluation of the `nemo-mbridge-mlm-bridge-training` skill before publication through NVSkills-Eval.

This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.

## Evaluation Summary

- Skill: `nemo-mbridge-mlm-bridge-training`
- Evaluation date: 2026-06-02
- NVSkills-Eval profile: `external`
- Environment: `local`
- Dataset: 1 evaluation tasks
- Attempts per task: 2
- Pass threshold: 50%
- Overall verdict: PASS

## Agents Used

- `claude-code`
- `codex`

## Metrics Used

Reported benchmark dimensions:

- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.

Underlying evaluation signals used in this run:

- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.

## Test Tasks

The benchmark dataset contained 1 evaluation tasks:

- Positive tasks: 1 tasks where the skill was expected to activate.
- Negative tasks: 0 tasks where no skill was expected.
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.

Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.

## Results

| Dimension | Num | `claude-code` | `codex` |
|---|---:|---:|---:|
| Security | 2 | 100% (+0%) | 100% (+0%) |
| Correctness | 2 | 100% (+0%) | 88% (+0%) |
| Discoverability | 2 | 100% (+0%) | 62% (+0%) |
| Effectiveness | 2 | 100% (+0%) | 100% (+0%) |
| Efficiency | 2 | 93% (-0%) | 60% (-0%) |

Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.

## Tier 1: Static Validation Summary

Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 12 total findings.

Top findings:

- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.author' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.tags' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)
- MEDIUM SCHEMA/author_missing: Author not specified in metadata (`skills/nemo-mbridge-mlm-bridge-training/SKILL.md`)

## Tier 2: Deduplication Summary

Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.

Notable observations:

- Context Deduplication: Collected 1 file(s)
- Inter-Skill Deduplication

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is nemo-mbridge-mlm-bridge-training for?

When should I use nemo-mbridge-mlm-bridge-training?

Is nemo-mbridge-mlm-bridge-training safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is nemo-mbridge-mlm-bridge-training for?

When should I use nemo-mbridge-mlm-bridge-training?

Is nemo-mbridge-mlm-bridge-training safe to install?

SKILL.md