
Nemo Evaluator Sdk
Wire custom request/response interceptors into NeMo Evaluator so benchmark runs hit your model endpoints with the right shaping and HTTP plumbing.
Overview
nemo-evaluator-sdk is an agent skill most often used in Ship (also Build integrations) that documents NeMo Evaluator’s interceptor pipeline between the eval engine and model HTTP endpoints.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill nemo-evaluator-sdkWhat is this skill?
- Documents the full adapter pipeline: request interceptors → endpoint HTTP call → response interceptors in reverse order
- Explains built-in interceptors from the nemo-evaluator core for common eval hooks
- Shows how adapter configuration is declared in eval job/config so agents do not hand-roll HTTP wrappers
- Clarifies separation between evaluation engine logic and per-endpoint interception
- Useful when swapping local vs hosted inference without rewriting the benchmark harness
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your benchmark harness calls model APIs directly and breaks whenever you need shared request shaping, logging, or response parsing across many eval tasks.
Who is it for?
Indie ML builders wiring NeMo Evaluator to custom or hosted inference with repeatable interceptor chains.
Skip if: Teams who only need a single curl to one endpoint with no shared eval pipeline—skip until you adopt NeMo Evaluator or similar structured eval runs.
When should I use this skill?
You are configuring NeMo Evaluator jobs, extending eval HTTP behavior, or debugging request/response handling between the engine and model APIs.
What do I get? / Deliverables
You configure ordered adapters and endpoint interceptors so evaluation runs stay consistent and you can swap endpoints without rewriting the eval core.
- Adapter/interceptor configuration for eval runs
- Documented request→endpoint→response pipeline for your endpoint
- Reusable eval integration without one-off API clients
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Model evaluation is the canonical shelf because this skill documents how the eval engine talks to endpoints—most builders reach for it when they are ready to measure quality, not when ideating features. Testing is where adapter pipelines matter: you need ordered interceptors on the way to the Model API and reverse order on responses before scores land.
Where it fits
Design a shared interceptor stack before your agent writes per-benchmark HTTP clients.
Run NeMo Evaluator suites against staging inference with auth and response normalization in the pipeline.
Add logging interceptors on production-like endpoints during regression evals without touching scorer logic.
How it compares
Integration documentation for eval HTTP plumbing—not a standalone MCP server or a generic unit-test skill.
Common Questions / FAQ
Who is nemo-evaluator-sdk for?
Solo and indie builders running structured model evaluations with NeMo Evaluator who need adapter/interceptor patterns instead of ad-hoc API clients in every script.
When should I use nemo-evaluator-sdk?
Use it in Ship when standing up benchmark pipelines and debugging eval-to-API mismatches; use it in Build when designing integration layers that multiple agents or jobs will reuse for inference calls.
Is nemo-evaluator-sdk safe to install?
Treat it like any third-party skill: review the Security Audits panel on this Prism page and inspect SKILL.md before granting network or secrets access in your agent environment.
SKILL.md
READMESKILL.md - Nemo Evaluator Sdk
# Adapter and Interceptor System NeMo Evaluator uses an adapter system to process requests and responses between the evaluation engine and model endpoints. The `nemo-evaluator` core library provides built-in interceptors for common use cases. ## Architecture Overview ``` ┌───────────────────────────────────────────────────────────────┐ │ Adapter Pipeline │ │ │ │ Request ───► [Interceptor 1] ───► [Interceptor 2] ───► │ │ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────┐ │ │ │ Endpoint Interceptor │ │ │ │ (HTTP call to Model API) │ │ │ └───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Response ◄─── [Interceptor 3] ◄─── [Interceptor 4] ◄─── │ │ │ └───────────────────────────────────────────────────────────────┘ ``` Interceptors execute in order for requests, and in reverse order for responses. ## Configuring Adapters The adapter configuration is specified in the `target.api_endpoint.adapter_config` section: ```yaml target: api_endpoint: model_id: meta/llama-3.1-8b-instruct url: https://integrate.api.nvidia.com/v1/chat/completions api_key_name: NGC_API_KEY adapter_config: interceptors: - name: system_message config: system_message: "You are a helpful assistant." - name: caching config: cache_dir: "./cache" - name: endpoint - name: reasoning config: start_reasoning_token: "<think>" end_reasoning_token: "</think>" ``` ## Available Interceptors ### System Message Interceptor Injects a system prompt into chat requests. ```yaml - name: system_message config: system_message: "You are a helpful AI assistant. Think step by step." ``` **Effect**: Prepends a system message to the messages array. ### Request Logging Interceptor Logs outbound API requests for debugging and analysis. ```yaml - name: request_logging config: max_requests: 1000 ``` ### Caching Interceptor Caches responses to avoid repeated API calls for identical requests. ```yaml - name: caching config: cache_dir: "./evaluation_cache" reuse_cached_responses: true save_requests: true save_responses: true max_saved_requests: 1000 max_saved_responses: 1000 ``` ### Endpoint Interceptor Performs the actual HTTP communication with the model endpoint. This is typically added automatically and has no configuration parameters. ```yaml - name: endpoint ``` ### Reasoning Interceptor Extracts and removes reasoning tokens (e.g., `<think>` tags) from model responses. ```yaml - name: reasoning config: start_reasoning_token: "<think>" end_reasoning_token: "</think>" enable_reasoning_tracking: true ``` **Effect**: Strips reasoning content from the response and tracks it separately. ### Response Logging Interceptor Logs API responses. ```yaml - name: response_logging config: max_responses: 1000 ``` ### Progress Tracking Interceptor Reports evaluation progress to an external URL. ```yaml - name: progress_tracking config: progress_tracking_url: "http://localhost:3828/progress" progress_tracking_interval: 10 ``` ### Additional Interceptors Other available interceptors include: - `payload_modifier`: Transforms request parameters - `response_stats`: Collects aggregated statistics from responses - `raise_client_errors`: Handles and raises exceptions for client e