Model Evaluator

Name: Model Evaluator
Author: jmsktm

jmsktm/claude-settings

Compare candidate LLM or ML models on accuracy, latency, cost, and failure modes before committing to one model in a prototype or production architecture.

npx skills add https://github.com/jmsktm/claude-settings --skill model-evaluator

Installs	172
Repository	jmsktm/claude-settings ↗

Related skills

Paper Context ResolverResolve narrow paper-backed gaps (splits, preprocessing, eval protocol, checkpoints, runtime) when README and repo files are insufficient for faithful ML paper reproduction.140k412

Repo Intake And PlanScan an ML research repository, extract README commands, and recommend the smallest trustworthy inference or evaluation reproduction target.140k412

Minimal Run And AuditRun documented inference, evaluation, or smoke commands and normalize evidence into standardized repro_outputs with patch notes.140k412

Env And Assets BootstrapBootstrap conda-first environments plus checkpoints, datasets, and cache paths before a documented reproduction run.140k412

Analyze ProjectMap a deep learning repo read-only—training, inference, and eval entrypoints, configs, and suspicious patterns—before you let an agent patch model code.52.4k412

Ai Research ReproductionOrchestrate README-first, minimal-trustworthy reproduction of a deep learning repository with auditable repro_outputs evidence.52.3k412

Data Science & MLllmresearch

Related skills

This week for builders

Related skills