Darwin Skill

Name: Darwin Skill
Author: alchaincyf

alchaincyf/darwin-skill

9.2k installs
5.1k repo stars
Updated July 27, 2026
alchaincyf/darwin-skill

darwin-skill is an agent skill that Darwin Skill 2.0 (达尔文.skill 2.0): autonomous skill optimizer, v2.0 integrates Microsoft Research SkillLens (arXiv 2605.23899) 9-dim rubric + SkillOpt (arXiv 260.

About

Darwin Skill 2 0 达尔文 skill 2 0 autonomous skill optimizer v2 0 integrates Microsoft Research SkillLens arXiv 2605 23899 9-dim rubric SkillOpt arXiv 2605 23904 validation-gated design human-in-the-loop checkpoints Evaluates SKILL md files using a 9-dimension rubric structure effectiveness meta-skill blacklists runs hill-climbing with git version control spawns independent judge name darwin-skill description Darwin Skill 2 0 达尔文 skill 2 0 autonomous skill optimizer v2 0 integrates Microsoft Research SkillLens arXiv 2605 23899 9-dim rubric SkillOpt arXiv 2605 23904 validation-gated design human-in-the-loop checkpoints Evaluates SKILL md files using a 9-dimension rubric structure effectiveness meta-skill blacklists runs hill-climbing with git version control spawns independent judge agents for blind evaluation validates improvements through test prompts with auto-break on diminishing returns and generates visual result cards Use when user mentions 优化skill skill评分自动优化 auto optimize skill质量检查达尔文 darwin 帮我改改skill skill怎么样提升skill质量 skill review skill打分 Darwin Skill 2 0 v2 0 2026-05-28 吸收 Microsoft Research SkillLens arXiv 2605 23899 的 9 维评分药方 SkillOpt arXiv 2605 23904 的 validation-gate.

**单一可编辑资产** - 每次只改一个 SKILL.md
**双重评估** - 结构评分（静态分析）+ 效果验证（跑测试看输出）
**棘轮机制** - 只保留改进，自动回滚退步
**独立评分** - 评分用子agent，避免「自己改自己评」的偏差
**人在回路** - 每个skill优化完后暂停，用户确认再继续

Darwin Skill by the numbers

9,185 all-time installs (skills.sh)
+278 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #111 of 2,184 Testing & QA skills by installs in the Skillselion catalog
Security screen: CRITICAL risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

darwin-skill capabilities & compatibility

Capabilities: **单一可编辑资产** — 每次只改一个 skill.md · **双重评估** — 结构评分（静态分析）+ 效果验证（跑测试看输出） · **棘轮机制** — 只保留改进，自动回滚退步 · **独立评分** — 评分用子agent，避免「自己改自己评」的偏差 · **人在回路** — 每个skill优化完后暂停，用户确认再继续
Use cases: documentation

From the docs

What darwin-skill says it does

为每个skill设计2-3个**典型用户prompt**（不是边缘case，是最常见的使用场景） 2.

SKILL.md

用子agent执行：一个带skill跑，一个不带skill跑（baseline） 3.

SKILL.md

确认优化范围： - 全部skills → 扫描 .claude/skills/*/SKILL.md - 指定skills → 用户指定列表 2.

SKILL.md

读取现有 results.tsv 了解历史优化记录 ``` ### Phase 0.5: 测试Prompt设计在评估之前，为每个skill设计测试prompt。这步很关键——没有测试prompt，「实测表现」维度就打不了分。 ``` for each skill: 1.

SKILL.md

npx skills add https://github.com/alchaincyf/darwin-skill --skill darwin-skill

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/alchaincyf/darwin-skill.svg)](https://skillselion.com/skills/alchaincyf/darwin-skill)

Installs	9.2k
repo stars	★ 5.1k
Security audit	1 / 3 scanners passed
Last updated	July 27, 2026
Repository	alchaincyf/darwin-skill ↗

What problem does darwin-skill solve for developers using this skill?

Darwin Skill 2.0 (达尔文.skill 2.0): autonomous skill optimizer, v2.0 integrates Microsoft Research SkillLens (arXiv 2605.23899) 9-dim rubric + SkillOpt (arXiv 2605.23904) validation-gated design + human

Who is it for?

Developers who need darwin-skill patterns described in the cached skill documentation.

Skip if: Skip when docs are empty or the task is outside the skill's documented scope.

When should I use this skill?

What you get

Actionable workflows and conventions from SKILL.md for darwin-skill.

Optimized SKILL.md
Visual result cards
Git-tracked iteration history

By the numbers

Uses 9-dimension SkillLens rubric (arXiv 2605.23899)
Integrates SkillOpt validation-gated design (arXiv 2605.23904)
Darwin Skill version: 2.0

Files

SKILL.mdMarkdownGitHub ↗

Darwin Skill 2.0

v2.0 · 2026-05-28 — 吸收 Microsoft Research SkillLens（arXiv 2605.23899）的 9 维评分药方 + SkillOpt（arXiv 2605.23904）的 validation-gated 验证机制 + human in the loop 三层守关。

借鉴 Karpathy autoresearch 的自主实验循环，对 skills 进行持续优化。

核心理念：评估 → 改进 → 实测验证 → 人类确认 → 保留或回滚 → 生成成果卡片

GitHub: https://github.com/alchaincyf/darwin-skill

---

设计哲学

autoresearch 的精髓： 1. 单一可编辑资产 — 每次只改一个 SKILL.md 2. 双重评估 — 结构评分（静态分析）+ 效果验证（跑测试看输出） 3. 棘轮机制 — 只保留改进，自动回滚退步 4. 独立评分 — 评分用子agent，避免「自己改自己评」的偏差 5. 人在回路 — 每个skill优化完后暂停，用户确认再继续

与纯结构审查的区别：不只看 SKILL.md 写得规不规范，更看改完后实际跑出来的效果是否更好。

---

评估 Rubric（9维度，总分100）

设计依据：基于 SkillLens 论文（arXiv 2605.23899）实证发现——LLM-as-judge 评估 skill 质量准确率仅 46.4%（接近随机），加入 meta-skill 三维度后提升到 73.8%。本 rubric 强化 dim3 / dim5 评分标准，新增 dim9「反例与黑名单」，权重平衡到 100。目的：让评分对真实质量更敏感，减少 LLM judge 的乐观偏差。

结构维度（59分）— 静态分析

#	维度	权重	评分标准
1	Frontmatter质量	7	name规范、description包含做什么+何时用+触发词、≤1024字符、禁结尾加"灵活应用/根据情况判断"等空话尾巴
2	工作流清晰度	12	步骤明确可执行、有序号、每步有明确输入/输出
3	失败模式编码	12	必须显式编码失败模式（写出"如果 X 失败 → Y"的明确分支）；有fallback路径、错误恢复；只写正向流程而不写失败分支扣 ≥3 分（SkillLens meta-skill 维度）
4	检查点设计	6	关键决策前有用户确认、防止自主失控；检查点必须显性标记（🔴/STOP/CHECKPOINT），仅靠"如果...建议..."措辞不算
5	可执行具体性	17	不模糊、有具体参数/格式/示例、可直接执行；禁止"建议/可以考虑/根据情况/灵活把握/视情况而定"等软化措辞——出现 ≥3 处扣 ≥3 分（SkillLens actionable specificity 维度）
6	资源整合度	4	references/scripts/assets引用正确、路径可达

效果维度（35分）— 需要实测

#	维度	权重	评分标准
7	整体架构	12	结构层次清晰、不冗余不遗漏、与花叔生态一致；冗余/AI腔废话段落（说白了/换句话说/首先其次综上等花叔禁用词）出现一处扣 1 分
8	实测表现	23	用测试prompt跑一遍，输出质量是否符合skill宣称的能力

Meta-skill 维度（6分）— 反例与黑名单

#	维度	权重	评分标准
9	反例与黑名单	6	skill 必须有"不要做什么"的反例清单；只写"应该做 X"没有"不要做 Y"扣 ≥3 分；红灯/危险动作/反模式应单独章节列出（SkillLens risk-action blacklist 维度）

评分规则

维度1-7、9：每个维度打 1-10 分，乘以权重得到该维度得分
维度8（实测表现）：跑2-3个测试prompt，按输出质量打1-10分
总分 = Σ(维度分 × 权重) / 10，满分100
改进后总分必须 严格高于 改进前才保留

Rubric 的实证基础

rubric 设计依据来自 SkillLens 论文（arXiv 2605.23899） + 本机 controlled study：

SkillLens 发现 LLM-as-judge 准确率仅 46.4%（接近随机），加入 meta-skill 三维度后升到 73.8%
本机对 huashu-research 做 4 类 degradation → 5 个独立 judge 盲测一致 V1>V2，Δ 均值 +46.5（5/5 high confidence）

结论：rubric 能识别 gross degradation，但 fine-grained quality difference 仍不可信，重要决策必须人审。

→ 详细论文证据 + 5 judges 完整数据 + HL 实战案例数字见 references/skilllens-evidence.md

关于「实测表现」维度

这是与纯结构评分最大的区别。评分方式：

1. 为每个skill设计2-3个典型用户prompt（不是边缘case，是最常见的使用场景） 2. 用子agent执行：一个带skill跑，一个不带skill跑（baseline） 3. 对比输出质量，从以下角度打分：

输出是否完成了用户意图？
相比不带skill的baseline，质量提升明显吗？
有没有skill引入的负面影响（过度冗余、跑偏、格式奇怪）？

若子 agent 不可用（超时/资源限制），退化为「干跑验证」：读完 skill 后模拟一个典型 prompt 的执行思路，判断流程是否合理；必须在 results.tsv 标注 dry_run。dry_run 比例 > 30% → 评估失效警告（来自本机 controlled study：dim8 实测维度权重 23%，无 full_test 验证时分数不可信）。

---

Runtime 适配性审查（gate 项，独立于 9 维度评分）

skill 应当能在 Claude Code / Codex / Cursor / OpenClaw / Hermes / Gemini CLI / OpenCode 等 50+ skills-compatible runtime 通用——否则其他 agent 解析时会被「在 Claude Code 里」「Claude Code skill」等措辞误判为「不是给我用的」直接拒装（实例：nuwa-skill 因此被 Marvis agent 拒绝）。

Phase 1 基线评估时强制跑一次红灯扫描

grep -nE "(在 Claude Code|Claude Code skill|Claude Code 用户|Cursor only|Codex 中|^\[!\[Claude Code|~/\.claude/skills/[a-z]|/plugin install\b)" SKILL.md README.md 2>/dev/null

输出非空 = 红灯命中 → 强制把 Phase 2 第一轮定为 P0「runtime drift 修复」（写入 results.tsv 的 note 列 runtime_warn=N）。

例外（允许的「Claude Code 痕迹」）

frontmatter 触发词、花叔生态内部 skill 名引用、明确标注 runtime-specific 章节、commit message——这些正当出现，不算红灯。

→ 红灯/绿灯完整对照表 + 例外清单详细规则 + Phase 1/2/3 各阶段审查时机见 references/runtime-neutrality.md

---

自主优化循环

Phase 0: 初始化

1. 确认优化范围：
   - 全部skills → 扫描 .claude/skills/*/SKILL.md
   - 指定skills → 用户指定列表
2. 创建 git 分支：auto-optimize/YYYYMMDD-HHMM
3. 初始化 results.tsv（如不存在）
4. 读取现有 results.tsv 了解历史优化记录

Phase 0.5: 测试Prompt设计

在评估之前，为每个skill设计测试prompt。这步很关键——没有测试prompt，「实测表现」维度就打不了分。

for each skill:
  1. 读取 SKILL.md，理解它做什么
  2. 设计2-3个测试prompt，覆盖：
     - 最典型的使用场景（happy path）
     - 一个稍复杂或有歧义的场景
  3. 保存到 skill目录/test-prompts.json：
     [
       {"id": 1, "prompt": "用户会说的话", "expected": "期望输出的简短描述"},
       {"id": 2, "prompt": "...", "expected": "..."}
     ]

展示所有测试prompt给用户，确认后再进入评估。测试prompt的质量决定了优化方向是否正确。

Phase 1: 基线评估（Baseline）

for each skill in 优化范围:

  # 结构评分（主agent可以做）
  1. 读取 SKILL.md 全文
  2. 按维度1-7逐项打分（附简短理由）

  # 效果评分（用子agent做，独立于主agent）
  3. 对每个测试prompt，spawn子agent：
     - with_skill: 带着SKILL.md执行测试prompt
     - baseline: 不带skill执行同一prompt
  4. 对比两组输出，打维度8的分

  # 汇总
  5. 计算加权总分
  6. 记录到 results.tsv

如果子agent不可用（超时、环境限制），维度8用干跑验证打分，标注 dry_run。不要因为跑不了测试就跳过这个维度——哪怕是模拟推演也比完全不看效果好。

基线评估完成后，展示评分卡：

┌──────────────────────────┬───────┬──────────────┬──────────────┐
│ Skill                    │ Score │ 结构短板      │ 效果短板      │
├──────────────────────────┼───────┼──────────────┼──────────────┤
│ huashu-proofreading      │ 78    │ 边界条件      │ 测试prompt2  │
│ huashu-slides            │ 72    │ 指令具体性    │ baseline持平  │
├──────────────────────────┼───────┼──────────────┼──────────────┤
│ 平均                     │ 75    │              │              │
└──────────────────────────┴───────┴──────────────┴──────────────┘

🔴 CHECKPOINT · 🛑 STOP：暂停等用户确认，再进入优化循环。

Phase 2: 优化循环

用户确认后，按基线分数从低到高排序，先优化最弱的。

for each skill:
  round = 0
  while round < MAX_ROUNDS (默认3):
    round += 1

    # Step 1: 诊断
    找出得分最低的维度（结构或效果都算）
    # HL-3 警告：dim2/dim3/dim4 是相关簇，修一个时另两个常跟着涨
    # → 不要因为 dim3 最低就单独修，要看整簇短板再决定是否同步改

    # Step 2: 提出改进方案
    针对最低维度，生成1个具体改进方案：
      - 改什么（具体段落/行）
      - 为什么改（对应rubric哪条）
      - 预期提升多少分

    # Step 3: 执行改进
    编辑 SKILL.md
    git add + commit（message: "optimize {skill}: {改进摘要}"）

    # Step 4: 重新评估
    - 结构维度：主agent重新打分
    - 效果维度：spawn独立子agent重跑测试prompt（关键！不能自己评自己）

    # Step 5: 决策
    if 新总分 > 旧总分:
      status = "keep"，更新旧总分
      # HL-4 见好就收：连续2轮 Δ < 2 分 → break 进 Phase 3
      if last_delta < 2.0 and this_delta < 2.0:
        print("触顶信号：连续2轮边际收益 < 2 分，停止优化避免过度调整")
        break
    else:
      status = "revert"
      git revert HEAD（创建新commit回滚，不用reset --hard）
      记录失败尝试到 results.tsv
      break  # 该skill到瓶颈，跳到下一个

    # Step 6: 日志
    results.tsv 追加行

  # === 🔴 CHECKPOINT · 每个 skill 优化完后强制人审 ===
  展示该skill的改动摘要：
    - git diff（改前 vs 改后）
    - 分数变化（哪些维度提升/下降）
    - 测试prompt输出对比（如果跑过的话）
  等用户确认 OK 再继续下一个skill。
  如果用户说"不好"，回滚到该skill的优化前版本。

Phase 2.5: 探索性重写（按需触发）

当 hill-climbing 连续2个skill都在 round 1 就 break（涨不动）时，提议一次「探索性重写」：

1. 选一个瓶颈skill
2. git stash 保存当前最优版本
3. 从头重写SKILL.md（不是微调，是重新组织结构和表达方式）
4. 重新评估
5. if 重写版 > stash版: 采用重写版
   else: git stash pop 恢复

这解决了 hill-climbing 的局部最优问题——有时候需要「先拆后建」才能突破瓶颈。 🔴 CHECKPOINT · 🛑 STOP：必须征得用户同意后才执行。

Phase 3: 汇总报告

## 优化报告

### 总览
- 优化skills数：N
- 总实验次数：M
- 保留改进：X（Y%）
- 回滚次数：Z
- 实测验证：A次完整测试 / B次干跑

### 分数变化
┌──────────────────────────┬────────┬────────┬────────┐
│ Skill                    │ Before │ After  │ Δ      │
├──────────────────────────┼────────┼────────┼────────┤
│ huashu-proofreading      │ 78     │ 87     │ +9     │
│ huashu-slides            │ 72     │ 83     │ +11    │
├──────────────────────────┼────────┼────────┼────────┤
│ 平均                     │ 75     │ 85     │ +10    │
└──────────────────────────┴────────┴────────┴────────┘

### 主要改进
1. [skill-A] 补充了边界条件处理，测试输出质量提升明显
2. [skill-B] 重组了workflow结构，baseline对比优势增大

---

results.tsv 格式

timestamp	commit	skill	old_score	new_score	status	dimension	note	eval_mode
2026-03-31T10:00	baseline	huashu-proofreading	-	78	baseline	-	初始评估	full_test
2026-03-31T10:05	a1b2c3d	huashu-proofreading	78	84	keep	边界条件	补充fallback	full_test
2026-03-31T10:10	b2c3d4e	huashu-proofreading	84	82	revert	指令具体性	过度细化	dry_run

新增 eval_mode 列：full_test（跑了子agent测试）或 dry_run（模拟推演）。文件位置：.claude/skills/darwin-skill/results.tsv

---

实战 high-leverage 操作（精髓速查）

4 条经实战验证（huashu-gpt-image +10.85 / huashu-weread-advisor +14.9 / claude-design +16.5）。详细案例数据见 references/skilllens-evidence.md 的「HL 实战案例」节。

HL-1（dim4）显性视觉标记是杠杆：加 🔴 CHECKPOINT / 🛑 STOP，靠「必须」措辞不行——LLM 解析时扫描视觉标记。4 行改动撬动 dim4 +3 分
HL-2（dim3）if-then 三段式 fallback 表：把「症状/解法」两列升级为「触发条件 / 一线修复 / 仍失败兜底」三段式。SkillLens failure-mechanism encoding 维度的落地
HL-3（Phase 2 诊断）维度相关簇警告：dim2/3/4 是相关簇——修 dim3 时 dim2 常跟着涨。「找最低维度」时同时看相关簇短板再决定是否同步改
HL-4（Phase 2 退出）触顶自动 break：连续 2 轮 Δ < 2 分 → break 进 Phase 3。+0.15 是停手信号不是继续信号；硬凑 MAX_ROUNDS=3 引入 over-engineering

---

优化策略库

按优先级排序，每轮只做最高优先级的一个：

P0: Runtime 适配性问题（gate 项命中 → 必须先修）

README/SKILL.md 出现红灯措辞（如「在 Claude Code 里」「Claude Code skill」）→ 替换为 runtime-neutral 措辞
Badge 钉死单一 runtime → 改为 Agent Skills Standard + skills.sh + Multi-Runtime 三个中立 badge
安装章节只给一种 runtime 的路径 → 改为「一行命令（auto-detect）+ 手动路径表 + 作为参考资料」三层结构
工作流硬编码 runtime-specific 工具且无 fallback → 给出通用替代方案或标注「仅在某 runtime 可用」
例外：skill 名明确标注单 runtime（如 xxx-codex）的，可跳过本项

P0: 效果问题（实测发现的）

测试输出偏离用户意图 → 检查skill是否有误导性指令
带skill比不带还差 → skill可能过度约束，考虑精简
输出格式不符合预期 → 补充明确的输出模板

P1: 结构性问题

Frontmatter缺少触发词 → 补充中英文触发词
缺少Phase/Step结构 → 重组为线性流程
缺少用户确认检查点 → 在关键决策处插入

P2: 具体性问题

步骤模糊（"处理图片"）→ 改为具体操作和参数
缺少输入/输出规格 → 补充格式、路径、示例
缺少异常处理 → 补充 "如果X失败，则Y"

P3: 可读性问题

段落过长 → 拆分+用表格
重复描述 → 合并去重
缺少速查 → 添加TL;DR或决策树

---

异常与边界条件

流程假设环境理想，但实操常遇异常。以下预定义 fallback，保证优化过程不会「一跑就卡住」。

场景	触发条件	处理动作
不在 git 仓库	`git rev-parse` 失败	询问用户：执行 `git init` 或回退到文件备份；用户选后者则 `cp SKILL.md SKILL.md.bak.YYYYMMDD-HHMM` 代替 revert
results.tsv 缺失	文件不存在	新建并写表头行（9列：含 eval_mode）
results.tsv 损坏	列数不匹配 / 非TSV	备份为 `.bak.YYYYMMDD-HHMM` 后重建，告知用户
分支已存在	`git checkout -b` 失败	分支名末尾加 `-2` / `-3`；第3次失败则切回现有分支并询问继续还是新起
`git revert` 失败	冲突 / 工作树脏	先 `git stash`，重试；仍失败则从上一个 commit 的 SKILL.md 读出覆盖当前文件手动恢复
MAX_ROUNDS 触顶（默认3）	已跑3轮仍有短板	不强制 break，展示当前最弱维度问用户「继续加1轮 / 进入Phase 2.5 / 收工」
优化后超 150% 体积	新文件 > 原 × 1.5	拒绝提交，回到改进步骤精简（删冗余/合并重复），再评
test-prompts.json 已存在	文件已在 skill 目录	默认复用并展示，问用户「复用 / 重写 / 追加」三选一
SKILL.md 找不到	目录存在但无 SKILL.md	该 skill 终止，results.tsv 记 `status=error`，继续下一个
分数计算规则	浮点精度漂移	总分保留 1 位小数，改进需严格 > 旧分（不靠四舍五入）

原则：异常先告知用户，再按规则处理；绝不静默跳过或静默失败。

---

darwin 操作反例黑名单（dim9 应用：darwin 自己优化时不要做的事）

来自本机 results.tsv 早期 40 次 0 revert 的教训 + Judge G/H 自指评估暴露的反模式。每条都是真实踩过的坑。

#	反模式	为什么不要做	替代做法
1	同 context 自评自改	改完后立刻在同一 Claude session 打分，会有「我刚改的肯定更好」乐观偏差（SkillLens 实证 LLM-as-judge 准确率仅 46.4%）	必须 spawn 独立子 agent 评分，且至少 2 个 judge 共识才信
2	`git reset --hard` 当回滚	会丢工作树未提交改动；CI 历史断裂	用 `git revert HEAD` 创建反向 commit，保留可追溯链
3	为凑分增冗余	触顶后继续硬改往往是「加废话/加段落让 LLM 觉得更详细」，实际质量不变	触顶信号（连续 2 轮 Δ<2 分）→ break 进 Phase 3，见好就收
4	跳过 test-prompts 直接评分	没有 test-prompts 的 dim8 是凭空打分，权重 23% 等于编造	Phase 0.5 强制设计 2-3 prompts；若用户不给，默认编 3 个并展示确认
5	轮内改多个维度	多变量同时变，分数升降无法归因到具体改动	每轮 1 个维度；相关簇（dim2/3/4）改其一时观察另两个是否跟涨
6	dry_run 比例 > 30%	dim8 实测维度形同虚设，分数虚高（早期 40 次记录 67% dry_run，0 revert）	强制至少 1 个真实 full_test；dry_run 多的优化在 results.tsv 显式打 ⚠️
7	静默跳过异常	遇到 git/tsv 异常时静默继续，破坏 ratchet 完整性	异常表 10 条 fallback 必须先告知用户再处理
8	忽视维度相关性单独优化	dim2/3/4 是相关簇，单独优化 dim2 时常发现已被前轮 dim3 修复推到顶	找最低维度时同时看相关簇短板，决定是否同步改

触发场景：每轮 Phase 2 改动前对照本表一次。任一反模式命中 → 改方案重写。

---

约束规则

1. 不改变skill的核心功能和用途 — 只优化"怎么写"和"怎么执行"，不改"做什么" 2. 不引入新依赖 — 不添加skill原本没有的scripts或references文件 3. 每轮只改一个维度 — 避免多个变更导致无法归因 4. 保持文件大小合理 — 优化后SKILL.md不应超过原始大小的150% 5. 尊重花叔风格 — 中文为主、简洁为上 6. 可回滚 — 所有改动在git分支上，用git revert而非reset --hard 7. 评分独立性 — 效果维度必须用子agent或至少干跑验证，不能在同一上下文里「改完直接评」 8. Runtime 中立性 — skill 必须能在 Claude Code、Codex、Cursor、OpenClaw、Hermes 等任何 skills-compatible runtime 中正常运行。除非 skill 名明确绑定单一 runtime（如 xxx-codex、huashu-slides-codex），任何「在 Claude Code 里」「Claude Code skill」「单一 badge 钉死」「安装命令只给 .claude/skills/ 一种路径」都视为 gate 不通过，须在 P0 优先修复（详见「Runtime 适配性审查」章节）

---

使用方式

全量优化（推荐首次使用）

用户："优化所有skills"
→ Phase 0-3 完整流程
→ 默认：先基线评估，按分数升序优先优化最低 5-10 个

单个优化

用户："优化 huashu-slides 这个skill"
→ 只对指定skill执行 Phase 0.5-2

仅评估不改

用户："评估所有skills的质量"
→ 只执行 Phase 0.5-1（设计测试prompt + 基线评估），不进入优化循环

查看历史

用户："看看skill优化历史"
→ 读取并展示 results.tsv

---

设计灵感

"You write the goals and constraints in program.md; let an agent generate and test code deltas indefinitely; keep only what measurably improves the objective."

— Karpathy, autoresearch

本skill的对应关系：

program.md → 本文件（评估rubric和约束规则）
train.py → 每个SKILL.md
val_bpb → 9维加权总分（含实测表现 + meta-skill 反例黑名单）
git ratchet → 只保留有改进的commit
test set → 每个skill的test-prompts.json

区别：增加了人在回路（autoresearch是全自主的，skill优化需要人的判断力），以及双重评估机制（结构+效果），因为skill的「好坏」比loss数值更微妙。

学术依据 & Credits

SkillLens（arXiv 2605.23899）：9 维 rubric 的实证来源（LLM 自评 46.4% → 加 meta-skill 三维度后 73.8%）。
SkillOpt（arXiv 2605.23904）：validation-gated edits 形式化框架。代码 github.com/microsoft/SkillOpt（pip install skillopt）、项目页 microsoft.github.io/SkillOpt。🤝 2026-06-03 微软官方仓库已把 darwin-skill 列入集成名单。
autoresearch：github.com/karpathy/autoresearch，本 skill 1.0 的原始灵感。

---

成果卡片生成（Result Card）

每个skill优化完成后（或全量汇总后），自动生成视觉成果卡片，截图保存为PNG。

卡片模板

模板位置：templates/result-card.html

3种风格，每次随机选择一种：

风格	CSS类	URL hash	视觉特点
Warm Swiss	`.theme-swiss`	`#swiss`	暖白底+赤陶橙，Inter字体，干净网格
Dark Terminal	`.theme-terminal`	`#terminal`	近黑底+荧光绿，等宽字体，扫描线
Newspaper	`.theme-newspaper`	`#newspaper`	暖白纸+深红，衬线字体，双栏编辑风

生成流程

1. 复制 templates/result-card.html 到临时工作文件
2. 用 sed/编辑工具 替换占位数据：
   - data-field="skill-name" → 实际skill名
   - data-field="score-before/after/delta" → 实际分数
   - 9个维度的 dim-bar-before/after width → 实际百分比（若模板仍是旧 8 维布局，加一行 dim9 反例黑名单条目）
   - data-field="improvement-1/2/3" → 实际改进摘要
   - data-field="date" → 当前日期
3. 随机选择风格：hash 设为 swiss/terminal/newspaper 之一
4. 用 scripts/screenshot.mjs 截图（2x 高清，只截 .card 元素，自动 open 图片）：
   node .claude/skills/darwin-skill/scripts/screenshot.mjs \
     /abs/path/to/card.html /abs/path/to/output.png
   # 回退方案（脚本失败时）：
   npx playwright screenshot "file:///path/to/card.html#[theme]" \
     output.png --viewport-size=960,1280 --wait-for-timeout=2000
5. 提示用户查看成果卡片 PNG

### 资源文件速查

| 路径 | 用途 |
|---|---|
| `templates/result-card.html` | 3风格主模板（swiss/terminal/newspaper，hash切换） |
| `templates/result-card-dark.html` / `-white.html` | 单一风格替代模板（需要锁定风格时用） |
| `scripts/screenshot.mjs` | 2x 高清截图，只截 .card，自动 open |
| `results.tsv` | 历次优化日志（9列含 eval_mode） |
| `{skill目录}/test-prompts.json` | 每个 skill 的测试 prompt 集（用于维度8实测） |

### 何时生成

- **单skill卡片**：每个skill优化完成后，展示该skill的分数变化
- **总览卡片**：全部优化完成后（Phase 3），展示全局战绩

### 品牌元素

- 顶部：Darwin.skill 品牌标识 + 日期
- 底部：「Train your Skills like you train your models」+ github.com/alchaincyf/darwin-skill

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>darwin.skill - Core Loop</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }
  html, body {
    width: 1200px;
    height: 500px;
    background: #111111;
    overflow: hidden;
    font-family: 'Inter', sans-serif;
  }
</style>
</head>
<body>
<svg width="1200" height="500" xmlns="http://www.w3.org/2000/svg">
  <defs>
    <!-- Arrow markers -->
    <marker id="arr-orange" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#D4532B"/>
    </marker>
    <marker id="arr-green" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#2B8A3E"/>
    </marker>
    <marker id="arr-red" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#C92A2A"/>
    </marker>
    <marker id="arr-orange-up" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#D4532B"/>
    </marker>
  </defs>

  <!-- Background -->
  <rect width="1200" height="500" fill="#111111"/>

  <!-- CORE LOOP label -->
  <text x="48" y="50" fill="#D4532B" font-family="Inter, sans-serif" font-size="11" font-weight="700" letter-spacing="3">CORE LOOP</text>

  <!-- B1: EVALUATE -->
  <rect x="48" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="110" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">EVALUATE</text>
  <text x="110" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">Current Skill</text>

  <!-- Arrow B1→B2 -->
  <line x1="172" y1="200" x2="204" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B2: IMPROVE -->
  <rect x="210" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="272" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">GENERATE</text>
  <text x="272" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">Improvement</text>

  <!-- Arrow B2→B3 -->
  <line x1="334" y1="200" x2="366" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B3: VALIDATE -->
  <rect x="372" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="434" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">VALIDATE</text>
  <text x="434" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">via Testing</text>

  <!-- Arrow B3→B4 -->
  <line x1="496" y1="200" x2="528" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B4: CONFIRM -->
  <rect x="534" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="596" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">HUMAN</text>
  <text x="596" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">CONFIRM</text>

  <!-- Arrow B4→Diamond -->
  <line x1="658" y1="200" x2="698" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- DECISION DIAMOND -->
  <polygon points="790,152 838,200 790,248 742,200" fill="#D4532B"/>
  <text x="790" y="196" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="12" font-weight="800" text-anchor="middle">SCORE</text>
  <text x="790" y="212" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="12" font-weight="800" text-anchor="middle">UP?</text>

  <!-- YES PATH -->
  <path d="M790,152 L790,120 L954,120"
        stroke="#2B8A3E" stroke-width="2" fill="none" marker-end="url(#arr-green)"/>

  <!-- YES label -->
  <text x="860" y="113" fill="#2B8A3E" font-family="Inter, sans-serif" font-size="10" font-weight="700" letter-spacing="1" text-anchor="middle">YES</text>

  <!-- YES result block: KEEP / git commit -->
  <rect x="960" y="85" width="130" height="70" fill="#2B8A3E"/>
  <text x="1025" y="115" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">KEEP</text>
  <text x="1025" y="133" fill="rgba(255,255,255,0.75)" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">git commit</text>

  <!-- NO PATH -->
  <path d="M790,248 L790,280 L954,280"
        stroke="#C92A2A" stroke-width="2" fill="none" marker-end="url(#arr-red)"/>

  <!-- NO label -->
  <text x="860" y="298" fill="#C92A2A" font-family="Inter, sans-serif" font-size="10" font-weight="700" letter-spacing="1" text-anchor="middle">NO</text>

  <!-- NO result block: REVERT / git revert -->
  <rect x="960" y="245" width="130" height="70" fill="#C92A2A"/>
  <text x="1025" y="275" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">REVERT</text>
  <text x="1025" y="293" fill="rgba(255,255,255,0.75)" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">git revert</text>

  <!-- LOOP BACK ARROW -->
  <path d="M1090,120 L1155,120 L1155,420 L48,420 L48,235"
        stroke="#D4532B" stroke-width="2" fill="none" stroke-dasharray="6,4"/>

  <path d="M1090,280 L1155,280"
        stroke="#D4532B" stroke-width="2" fill="none" stroke-dasharray="6,4"/>

  <!-- Arrow head pointing up at B1 left -->
  <polygon points="42,236 54,236 48,222" fill="#D4532B"/>

  <!-- LOOP BACK label along bottom -->
  <text x="590" y="448" fill="#444444" font-family="Inter, sans-serif" font-size="11" font-weight="600" text-anchor="middle" letter-spacing="3">LOOP BACK</text>

  <!-- STEP NUMBERS -->
  <text x="48" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">01</text>
  <text x="210" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">02</text>
  <text x="372" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">03</text>
  <text x="534" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">04</text>
  <text x="762" y="145" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700" opacity="0.9">05</text>

  <!-- SUBTITLE TEXT -->
  <text x="48" y="476" fill="#333333" font-family="Inter, sans-serif" font-size="10" font-weight="600" letter-spacing="1">darwin.skill</text>
  <text x="1152" y="476" fill="#333333" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="end" letter-spacing="1">CONTINUOUS IMPROVEMENT ENGINE</text>

</svg>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>达尔文.skill - Core Loop</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }
  html, body {
    width: 1200px;
    height: 500px;
    background: #111111;
    overflow: hidden;
    font-family: 'Inter', sans-serif;
  }
</style>
</head>
<body>
<svg width="1200" height="500" xmlns="http://www.w3.org/2000/svg">
  <defs>
    <!-- Arrow markers -->
    <marker id="arr-orange" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#D4532B"/>
    </marker>
    <marker id="arr-green" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#2B8A3E"/>
    </marker>
    <marker id="arr-red" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#C92A2A"/>
    </marker>
    <marker id="arr-orange-up" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#D4532B"/>
    </marker>
  </defs>

  <!-- Background -->
  <rect width="1200" height="500" fill="#111111"/>

  <!-- CORE LOOP label -->
  <text x="48" y="50" fill="#D4532B" font-family="Inter, sans-serif" font-size="11" font-weight="700" letter-spacing="3">CORE LOOP</text>

  <!--
    Layout plan (y-center = 200):
    Block width = 120, height = 70
    Gap between blocks = 36 (arrow)
    Decision diamond: center at x=750, size 90×90

    Blocks horizontal positions (left edge):
    B1 EVALUATE:  x=48,  center=108
    B2 IMPROVE:   x=228, center=288
    B3 VALIDATE:  x=408, center=468
    B4 CONFIRM:   x=588, center=648
    Diamond:      center=790
    YES block:    x=958, center=1018  (top, y=145)
    NO block:     x=958, center=1018  (bottom, y=255)

    All vertical center: y=200
    YES block center y = 160
    NO block center y = 260
  -->

  <!-- ======================== -->
  <!-- MAIN PROCESS BLOCKS      -->
  <!-- ======================== -->

  <!-- B1: EVALUATE -->
  <rect x="48" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="110" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">EVALUATE</text>
  <text x="110" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">评估当前技能</text>

  <!-- Arrow B1→B2 -->
  <line x1="172" y1="200" x2="204" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B2: IMPROVE -->
  <rect x="210" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="272" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">IMPROVE</text>
  <text x="272" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">生成改进方案</text>

  <!-- Arrow B2→B3 -->
  <line x1="334" y1="200" x2="366" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B3: VALIDATE -->
  <rect x="372" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="434" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">VALIDATE</text>
  <text x="434" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">实测验证效果</text>

  <!-- Arrow B3→B4 -->
  <line x1="496" y1="200" x2="528" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- B4: CONFIRM -->
  <rect x="534" y="165" width="124" height="70" fill="#FFFFFF"/>
  <text x="596" y="196" fill="#111111" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">CONFIRM</text>
  <text x="596" y="213" fill="#555555" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">人类确认结果</text>

  <!-- Arrow B4→Diamond -->
  <line x1="658" y1="200" x2="698" y2="200" stroke="#D4532B" stroke-width="2" marker-end="url(#arr-orange)"/>

  <!-- ======================== -->
  <!-- DECISION DIAMOND         -->
  <!-- center: 790, 200         -->
  <!-- size: 96×80              -->
  <!-- ======================== -->
  <polygon points="790,152 838,200 790,248 742,200" fill="#D4532B"/>
  <text x="790" y="196" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="12" font-weight="800" text-anchor="middle">SCORE</text>
  <text x="790" y="212" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="12" font-weight="800" text-anchor="middle">UP?</text>

  <!-- ======================== -->
  <!-- YES PATH (top branch)    -->
  <!-- ======================== -->

  <!-- YES PATH: diamond top → up → right to YES block -->
  <!-- Diamond top point: 790, 152 -->
  <path d="M790,152 L790,120 L954,120"
        stroke="#2B8A3E" stroke-width="2" fill="none" marker-end="url(#arr-green)"/>

  <!-- YES label -->
  <text x="860" y="113" fill="#2B8A3E" font-family="Inter, sans-serif" font-size="10" font-weight="700" letter-spacing="1" text-anchor="middle">YES</text>

  <!-- YES result block: KEEP / git commit -->
  <rect x="960" y="85" width="130" height="70" fill="#2B8A3E"/>
  <text x="1025" y="115" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">KEEP</text>
  <text x="1025" y="133" fill="rgba(255,255,255,0.75)" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">git commit</text>

  <!-- ======================== -->
  <!-- NO PATH (bottom branch)  -->
  <!-- ======================== -->

  <!-- NO PATH: diamond bottom → down → right to NO block -->
  <!-- Diamond bottom point: 790, 248 -->
  <path d="M790,248 L790,280 L954,280"
        stroke="#C92A2A" stroke-width="2" fill="none" marker-end="url(#arr-red)"/>

  <!-- NO label -->
  <text x="860" y="298" fill="#C92A2A" font-family="Inter, sans-serif" font-size="10" font-weight="700" letter-spacing="1" text-anchor="middle">NO</text>

  <!-- NO result block: REVERT / git revert -->
  <rect x="960" y="245" width="130" height="70" fill="#C92A2A"/>
  <text x="1025" y="275" fill="#FFFFFF" font-family="Inter, sans-serif" font-size="13" font-weight="800" text-anchor="middle">REVERT</text>
  <text x="1025" y="293" fill="rgba(255,255,255,0.75)" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="middle">git revert</text>

  <!-- ======================== -->
  <!-- LOOP BACK ARROW          -->
  <!-- from KEEP/REVERT → back to EVALUATE -->
  <!-- ======================== -->

  <!-- From right side of YES block (1090, 120) → right to 1155 → down to 420 → left to 48 → up to 200 (B1 left) -->
  <path d="M1090,120 L1155,120 L1155,420 L48,420 L48,235"
        stroke="#D4532B" stroke-width="2" fill="none" stroke-dasharray="6,4"/>

  <!-- Also from NO block right → join same vertical -->
  <path d="M1090,280 L1155,280"
        stroke="#D4532B" stroke-width="2" fill="none" stroke-dasharray="6,4"/>

  <!-- Arrow head pointing up at B1 left -->
  <polygon points="42,236 54,236 48,222" fill="#D4532B"/>

  <!-- LOOP BACK label along bottom -->
  <text x="590" y="448" fill="#444444" font-family="Inter, sans-serif" font-size="11" font-weight="600" text-anchor="middle" letter-spacing="3">LOOP BACK</text>

  <!-- ======================== -->
  <!-- STEP NUMBERS             -->
  <!-- ======================== -->
  <text x="48" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">01</text>
  <text x="210" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">02</text>
  <text x="372" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">03</text>
  <text x="534" y="158" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700">04</text>
  <text x="762" y="145" fill="#D4532B" font-family="Inter, sans-serif" font-size="10" font-weight="700" opacity="0.9">05</text>

  <!-- ======================== -->
  <!-- SUBTITLE TEXT            -->
  <!-- ======================== -->
  <text x="48" y="476" fill="#333333" font-family="Inter, sans-serif" font-size="10" font-weight="600" letter-spacing="1">达尔文.skill</text>
  <text x="1152" y="476" fill="#333333" font-family="Inter, sans-serif" font-size="10" font-weight="600" text-anchor="end" letter-spacing="1">CONTINUOUS IMPROVEMENT ENGINE</text>

</svg>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Optimization Lifecycle</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 400px;
    background: #111111;
    font-family: 'Inter', -apple-system, sans-serif;
    overflow: hidden;
    position: relative;
  }

  .top-label {
    position: absolute;
    top: 28px;
    left: 44px;
    font-size: 11px;
    font-weight: 800;
    letter-spacing: 0.18em;
    color: #D4532B;
    text-transform: uppercase;
  }

  /* Center row: phases + arrows */
  .row {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
    display: flex;
    align-items: center;
    gap: 0;
  }

  /* === Phase boxes === */
  .phase {
    display: flex;
    flex-direction: column;
    align-items: center;
    justify-content: center;
    border-radius: 8px;
    flex-shrink: 0;
  }

  .phase-num {
    font-weight: 900;
    line-height: 1;
  }

  .phase-name {
    font-weight: 600;
    text-align: center;
    line-height: 1.3;
  }

  /* Regular dark */
  .p-dark {
    background: #1D1D1D;
    border: 1px solid #2D2D2D;
    width: 130px;
    height: 130px;
  }
  .p-dark .phase-num {
    font-size: 38px;
    color: #FFFFFF;
    margin-bottom: 10px;
  }
  .p-dark .phase-name {
    font-size: 13px;
    color: #888888;
  }

  /* White */
  .p-white {
    background: #FFFFFF;
    border: 1px solid #DDDDDD;
    width: 130px;
    height: 130px;
  }
  .p-white .phase-num {
    font-size: 38px;
    color: #111111;
    margin-bottom: 10px;
  }
  .p-white .phase-name {
    font-size: 13px;
    color: #555555;
  }

  /* Core - Phase 2 */
  .p-core {
    background: #D4532B;
    border: 2px solid #E06035;
    width: 196px;
    height: 196px;
    box-shadow: 0 0 50px rgba(212, 83, 43, 0.4), 0 0 16px rgba(212, 83, 43, 0.25);
  }
  .p-core .phase-num {
    font-size: 56px;
    color: #FFFFFF;
    margin-bottom: 12px;
  }
  .p-core .phase-name {
    font-size: 15px;
    font-weight: 700;
    color: rgba(255,255,255,0.93);
  }

  /* === Connector === */
  .connector {
    display: flex;
    flex-direction: column;
    align-items: center;
    width: 72px;
    flex-shrink: 0;
    position: relative;
  }

  .conn-label {
    font-size: 9.5px;
    color: #505050;
    font-weight: 600;
    letter-spacing: 0.04em;
    white-space: nowrap;
    margin-bottom: 9px;
  }

  .arrow {
    display: flex;
    align-items: center;
    width: 100%;
  }

  .arrow-line {
    flex: 1;
    height: 2px;
    background: #D4532B;
  }

  .arrow-head {
    width: 0;
    height: 0;
    border-top: 5px solid transparent;
    border-bottom: 5px solid transparent;
    border-left: 9px solid #D4532B;
  }
</style>
</head>
<body>

  <div class="top-label">OPTIMIZATION LIFECYCLE</div>

  <div class="row">

    <div class="phase p-dark">
      <div class="phase-num">0</div>
      <div class="phase-name">Initialize</div>
    </div>

    <div class="connector">
      <div class="conn-label">Human Confirm</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-dark">
      <div class="phase-num">0.5</div>
      <div class="phase-name">Test Design</div>
    </div>

    <div class="connector">
      <div class="conn-label">Human Confirm</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-white">
      <div class="phase-num">1</div>
      <div class="phase-name">Baseline</div>
    </div>

    <div class="connector">
      <div class="conn-label">Human Confirm</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-core">
      <div class="phase-num">2</div>
      <div class="phase-name">Optimize</div>
    </div>

    <div class="connector">
      <div class="conn-label">Human Confirm</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-dark">
      <div class="phase-num">3</div>
      <div class="phase-name">Report</div>
    </div>

  </div>

</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Optimization Lifecycle</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 400px;
    background: #111111;
    font-family: 'Inter', -apple-system, sans-serif;
    overflow: hidden;
    position: relative;
  }

  .top-label {
    position: absolute;
    top: 28px;
    left: 44px;
    font-size: 11px;
    font-weight: 800;
    letter-spacing: 0.18em;
    color: #D4532B;
    text-transform: uppercase;
  }

  /* Center row: phases + arrows */
  .row {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
    display: flex;
    align-items: center;
    gap: 0;
  }

  /* === Phase boxes === */
  .phase {
    display: flex;
    flex-direction: column;
    align-items: center;
    justify-content: center;
    border-radius: 8px;
    flex-shrink: 0;
  }

  .phase-num {
    font-weight: 900;
    line-height: 1;
  }

  .phase-name {
    font-weight: 600;
    text-align: center;
    line-height: 1.3;
  }

  /* Regular dark */
  .p-dark {
    background: #1D1D1D;
    border: 1px solid #2D2D2D;
    width: 130px;
    height: 130px;
  }
  .p-dark .phase-num {
    font-size: 38px;
    color: #FFFFFF;
    margin-bottom: 10px;
  }
  .p-dark .phase-name {
    font-size: 13px;
    color: #888888;
  }

  /* White */
  .p-white {
    background: #FFFFFF;
    border: 1px solid #DDDDDD;
    width: 130px;
    height: 130px;
  }
  .p-white .phase-num {
    font-size: 38px;
    color: #111111;
    margin-bottom: 10px;
  }
  .p-white .phase-name {
    font-size: 13px;
    color: #555555;
  }

  /* Core - Phase 2 */
  .p-core {
    background: #D4532B;
    border: 2px solid #E06035;
    width: 196px;
    height: 196px;
    box-shadow: 0 0 50px rgba(212, 83, 43, 0.4), 0 0 16px rgba(212, 83, 43, 0.25);
  }
  .p-core .phase-num {
    font-size: 56px;
    color: #FFFFFF;
    margin-bottom: 12px;
  }
  .p-core .phase-name {
    font-size: 15px;
    font-weight: 700;
    color: rgba(255,255,255,0.93);
  }

  /* === Connector === */
  .connector {
    display: flex;
    flex-direction: column;
    align-items: center;
    width: 72px;
    flex-shrink: 0;
    position: relative;
  }

  .conn-label {
    font-size: 9.5px;
    color: #505050;
    font-weight: 600;
    letter-spacing: 0.04em;
    white-space: nowrap;
    margin-bottom: 9px;
  }

  .arrow {
    display: flex;
    align-items: center;
    width: 100%;
  }

  .arrow-line {
    flex: 1;
    height: 2px;
    background: #D4532B;
  }

  .arrow-head {
    width: 0;
    height: 0;
    border-top: 5px solid transparent;
    border-bottom: 5px solid transparent;
    border-left: 9px solid #D4532B;
  }
</style>
</head>
<body>

  <div class="top-label">OPTIMIZATION LIFECYCLE</div>

  <div class="row">

    <div class="phase p-dark">
      <div class="phase-num">0</div>
      <div class="phase-name">初始化</div>
    </div>

    <div class="connector">
      <div class="conn-label">人类确认</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-dark">
      <div class="phase-num">0.5</div>
      <div class="phase-name">测试设计</div>
    </div>

    <div class="connector">
      <div class="conn-label">人类确认</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-white">
      <div class="phase-num">1</div>
      <div class="phase-name">基线评估</div>
    </div>

    <div class="connector">
      <div class="conn-label">人类确认</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-core">
      <div class="phase-num">2</div>
      <div class="phase-name">优化循环</div>
    </div>

    <div class="connector">
      <div class="conn-label">人类确认</div>
      <div class="arrow">
        <div class="arrow-line"></div>
        <div class="arrow-head"></div>
      </div>
    </div>

    <div class="phase p-dark">
      <div class="phase-num">3</div>
      <div class="phase-name">汇总报告</div>
    </div>

  </div>

</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ratchet Mechanism</title>
<style>
  @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800&display=swap');

  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 450px;
    background: #111111;
    font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
    display: flex;
    flex-direction: column;
    overflow: hidden;
    position: relative;
  }

  .header {
    padding: 28px 60px 0;
    display: flex;
    align-items: baseline;
    gap: 16px;
  }

  .label {
    color: #D4532B;
    font-size: 13px;
    font-weight: 800;
    letter-spacing: 2px;
    text-transform: uppercase;
  }

  .subtitle {
    color: #555;
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 1px;
  }

  .chart-area {
    flex: 1;
    display: flex;
    align-items: flex-end;
    justify-content: center;
    padding: 0 60px 60px;
    gap: 0;
    position: relative;
  }

  .bars-wrapper {
    display: flex;
    align-items: flex-end;
    gap: 40px;
    position: relative;
    width: 100%;
    justify-content: center;
  }

  .bar-group {
    display: flex;
    flex-direction: column;
    align-items: center;
    position: relative;
    width: 80px;
  }

  .score {
    font-size: 36px;
    font-weight: 800;
    color: #ffffff;
    margin-bottom: 10px;
    line-height: 1;
  }

  .score.rollback {
    color: #C92A2A;
    text-decoration: line-through;
    text-decoration-thickness: 3px;
  }

  .bar {
    width: 80px;
    border-radius: 4px 4px 0 0;
    position: relative;
  }

  .bar.baseline {
    background: #444444;
  }

  .bar.retained {
    background: #ffffff;
  }

  .bar.rollback-bar {
    background: transparent;
    border: 2px dashed #C92A2A;
    border-bottom: none;
  }

  .bar.highlight {
    background: #D4532B;
  }

  .round-label {
    margin-top: 12px;
    color: #666666;
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 0.5px;
    white-space: nowrap;
  }

  /* SVG overlay for arrows and ratchet line */
  .svg-overlay {
    position: absolute;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    pointer-events: none;
  }
</style>
</head>
<body>

<div class="header">
  <div class="label">RATCHET MECHANISM</div>
  <div class="subtitle">— effective baseline only moves up</div>
</div>

<div class="chart-area" id="chartArea">
  <div class="bars-wrapper" id="barsWrapper">
    <!-- bars will be injected by JS -->
  </div>
  <svg class="svg-overlay" id="svgOverlay"></svg>
</div>

<script>
  const scores = [72, 78, 75, 84, 87];
  const types  = ['baseline', 'retained', 'rollback', 'retained', 'highlight'];
  const rounds = ['Round 0', 'Round 1', 'Round 2', 'Round 3', 'Round 4'];

  // Effective baseline sequence (ratchet): 72, 78, 78, 84, 87
  const effectiveBaseline = [72, 78, 78, 84, 87];

  const maxScore = 90;
  const minScore = 60;
  const chartHeight = 270; // px available for bars

  function barHeight(score) {
    return Math.round((score - minScore) / (maxScore - minScore) * chartHeight);
  }

  const wrapper = document.getElementById('barsWrapper');

  scores.forEach((score, i) => {
    const group = document.createElement('div');
    group.className = 'bar-group';
    group.id = `group-${i}`;

    const scoreEl = document.createElement('div');
    scoreEl.className = 'score' + (types[i] === 'rollback' ? ' rollback' : '');
    scoreEl.textContent = score;

    const bar = document.createElement('div');
    bar.className = 'bar ' + (types[i] === 'rollback' ? 'rollback-bar' : types[i]);
    const h = barHeight(score);
    bar.style.height = h + 'px';

    const label = document.createElement('div');
    label.className = 'round-label';
    label.textContent = rounds[i];

    group.appendChild(scoreEl);
    group.appendChild(bar);
    group.appendChild(label);
    wrapper.appendChild(group);
  });

  // Draw arrows and ratchet line after layout
  requestAnimationFrame(() => {
    requestAnimationFrame(() => {
      const svg = document.getElementById('svgOverlay');
      const chartArea = document.getElementById('chartArea');
      const chartRect = chartArea.getBoundingClientRect();

      // Collect bar group positions
      const groups = [];
      for (let i = 0; i < 5; i++) {
        const g = document.getElementById(`group-${i}`);
        const rect = g.getBoundingClientRect();
        // top of the bar (not the score label)
        const bar = g.querySelector('.bar');
        const barRect = bar.getBoundingClientRect();
        groups.push({
          cx: rect.left - chartRect.left + rect.width / 2,
          barTop: barRect.top - chartRect.top,
          barBottom: barRect.bottom - chartRect.top,
        });
      }

      // Arrow heads: connect bar top centers (exclude rollback from arrows, draw arrow anyway between all)
      const arrowColor = '#D4532B';
      const arrowGap = 8;

      let svgContent = `
        <defs>
          <marker id="arrow" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
            <path d="M0,0 L0,6 L8,3 z" fill="${arrowColor}" />
          </marker>
        </defs>
      `;

      // Draw horizontal arrows between consecutive bar centers at mid-height of the lower bar
      for (let i = 0; i < 4; i++) {
        const fromX = groups[i].cx + 40 + arrowGap;
        const toX = groups[i+1].cx - 40 - arrowGap - 8;
        const higherBarTop = Math.min(groups[i].barTop, groups[i+1].barTop);
        const lowerBarTop = Math.max(groups[i].barTop, groups[i+1].barTop);
        const lowerBarBottom = Math.max(groups[i].barBottom, groups[i+1].barBottom);
        const y = lowerBarTop + (lowerBarBottom - lowerBarTop) * 0.5;

        // Use a slightly raised y to look cleaner
        const arrowY = Math.min(groups[i].barTop, groups[i+1].barTop) - 18;
        const clampedY = Math.max(arrowY, 40);

        svgContent += `<line x1="${fromX}" y1="${clampedY}" x2="${toX}" y2="${clampedY}"
          stroke="${arrowColor}" stroke-width="2" marker-end="url(#arrow)" opacity="0.7"/>`;
      }

      // Ratchet line: connects effective baseline tops
      const effectiveHeights = effectiveBaseline.map(s => barHeight(s));
      const baselineBottom = groups[0].barBottom; // all bars share same bottom

      // Points for ratchet line
      const points = groups.map((g, i) => {
        const effH = effectiveHeights[i];
        const y = baselineBottom - effH;
        return { x: g.cx, y };
      });

      // Draw dashed orange line through effective baseline tops
      let pathD = `M ${points[0].x} ${points[0].y}`;
      for (let i = 1; i < points.length; i++) {
        pathD += ` L ${points[i].x} ${points[i].y}`;
      }

      svgContent += `<path d="${pathD}" fill="none" stroke="${arrowColor}" stroke-width="2"
        stroke-dasharray="6,4" opacity="0.9"/>`;

      // Dots at each effective baseline point
      points.forEach((p, i) => {
        svgContent += `<circle cx="${p.x}" cy="${p.y}" r="4" fill="${arrowColor}" opacity="0.9"/>`;
      });

      // Label for the ratchet line
      svgContent += `<text x="${points[4].x + 10}" y="${points[4].y - 6}" fill="${arrowColor}"
        font-family="Inter, sans-serif" font-size="11" font-weight="700" letter-spacing="0.5">EFFECTIVE FLOOR</text>`;

      svg.innerHTML = svgContent;
    });
  });
</script>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ratchet Mechanism</title>
<style>
  @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;800&display=swap');

  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 450px;
    background: #111111;
    font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
    display: flex;
    flex-direction: column;
    overflow: hidden;
    position: relative;
  }

  .header {
    padding: 28px 60px 0;
    display: flex;
    align-items: baseline;
    gap: 16px;
  }

  .label {
    color: #D4532B;
    font-size: 13px;
    font-weight: 800;
    letter-spacing: 2px;
    text-transform: uppercase;
  }

  .subtitle {
    color: #555;
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 1px;
  }

  .chart-area {
    flex: 1;
    display: flex;
    align-items: flex-end;
    justify-content: center;
    padding: 0 60px 60px;
    gap: 0;
    position: relative;
  }

  .bars-wrapper {
    display: flex;
    align-items: flex-end;
    gap: 40px;
    position: relative;
    width: 100%;
    justify-content: center;
  }

  .bar-group {
    display: flex;
    flex-direction: column;
    align-items: center;
    position: relative;
    width: 80px;
  }

  .score {
    font-size: 36px;
    font-weight: 800;
    color: #ffffff;
    margin-bottom: 10px;
    line-height: 1;
  }

  .score.rollback {
    color: #C92A2A;
    text-decoration: line-through;
    text-decoration-thickness: 3px;
  }

  .bar {
    width: 80px;
    border-radius: 4px 4px 0 0;
    position: relative;
  }

  .bar.baseline {
    background: #444444;
  }

  .bar.retained {
    background: #ffffff;
  }

  .bar.rollback-bar {
    background: transparent;
    border: 2px dashed #C92A2A;
    border-bottom: none;
  }

  .bar.highlight {
    background: #D4532B;
  }

  .round-label {
    margin-top: 12px;
    color: #666666;
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 0.5px;
    white-space: nowrap;
  }

  /* SVG overlay for arrows and ratchet line */
  .svg-overlay {
    position: absolute;
    top: 0;
    left: 0;
    width: 100%;
    height: 100%;
    pointer-events: none;
  }
</style>
</head>
<body>

<div class="header">
  <div class="label">RATCHET MECHANISM</div>
  <div class="subtitle">— effective baseline only moves up</div>
</div>

<div class="chart-area" id="chartArea">
  <div class="bars-wrapper" id="barsWrapper">
    <!-- bars will be injected by JS -->
  </div>
  <svg class="svg-overlay" id="svgOverlay"></svg>
</div>

<script>
  const scores = [72, 78, 75, 84, 87];
  const types  = ['baseline', 'retained', 'rollback', 'retained', 'highlight'];
  const rounds = ['轮次 0', '轮次 1', '轮次 2', '轮次 3', '轮次 4'];

  // Effective baseline sequence (ratchet): 72, 78, 78, 84, 87
  const effectiveBaseline = [72, 78, 78, 84, 87];

  const maxScore = 90;
  const minScore = 60;
  const chartHeight = 270; // px available for bars

  function barHeight(score) {
    return Math.round((score - minScore) / (maxScore - minScore) * chartHeight);
  }

  const wrapper = document.getElementById('barsWrapper');

  scores.forEach((score, i) => {
    const group = document.createElement('div');
    group.className = 'bar-group';
    group.id = `group-${i}`;

    const scoreEl = document.createElement('div');
    scoreEl.className = 'score' + (types[i] === 'rollback' ? ' rollback' : '');
    scoreEl.textContent = score;

    const bar = document.createElement('div');
    bar.className = 'bar ' + (types[i] === 'rollback' ? 'rollback-bar' : types[i]);
    const h = barHeight(score);
    bar.style.height = h + 'px';

    const label = document.createElement('div');
    label.className = 'round-label';
    label.textContent = rounds[i];

    group.appendChild(scoreEl);
    group.appendChild(bar);
    group.appendChild(label);
    wrapper.appendChild(group);
  });

  // Draw arrows and ratchet line after layout
  requestAnimationFrame(() => {
    requestAnimationFrame(() => {
      const svg = document.getElementById('svgOverlay');
      const chartArea = document.getElementById('chartArea');
      const chartRect = chartArea.getBoundingClientRect();

      // Collect bar group positions
      const groups = [];
      for (let i = 0; i < 5; i++) {
        const g = document.getElementById(`group-${i}`);
        const rect = g.getBoundingClientRect();
        // top of the bar (not the score label)
        const bar = g.querySelector('.bar');
        const barRect = bar.getBoundingClientRect();
        groups.push({
          cx: rect.left - chartRect.left + rect.width / 2,
          barTop: barRect.top - chartRect.top,
          barBottom: barRect.bottom - chartRect.top,
        });
      }

      // Arrow heads: connect bar top centers (exclude rollback from arrows, draw arrow anyway between all)
      const arrowColor = '#D4532B';
      const arrowGap = 8;

      let svgContent = `
        <defs>
          <marker id="arrow" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
            <path d="M0,0 L0,6 L8,3 z" fill="${arrowColor}" />
          </marker>
        </defs>
      `;

      // Draw horizontal arrows between consecutive bar centers at mid-height of the lower bar
      for (let i = 0; i < 4; i++) {
        const fromX = groups[i].cx + 40 + arrowGap;
        const toX = groups[i+1].cx - 40 - arrowGap - 8;
        const higherBarTop = Math.min(groups[i].barTop, groups[i+1].barTop);
        const lowerBarTop = Math.max(groups[i].barTop, groups[i+1].barTop);
        const lowerBarBottom = Math.max(groups[i].barBottom, groups[i+1].barBottom);
        const y = lowerBarTop + (lowerBarBottom - lowerBarTop) * 0.5;

        // Use a slightly raised y to look cleaner
        const arrowY = Math.min(groups[i].barTop, groups[i+1].barTop) - 18;
        const clampedY = Math.max(arrowY, 40);

        svgContent += `<line x1="${fromX}" y1="${clampedY}" x2="${toX}" y2="${clampedY}"
          stroke="${arrowColor}" stroke-width="2" marker-end="url(#arrow)" opacity="0.7"/>`;
      }

      // Ratchet line: connects effective baseline tops
      // Effective baseline scores: 72, 78, 78, 84, 87
      // For round 2 (rollback), use effective = 78, not 75
      const effectiveHeights = effectiveBaseline.map(s => barHeight(s));
      const baselineBottom = groups[0].barBottom; // all bars share same bottom

      // Points for ratchet line
      const points = groups.map((g, i) => {
        const effH = effectiveHeights[i];
        const y = baselineBottom - effH;
        return { x: g.cx, y };
      });

      // Draw dashed orange line through effective baseline tops
      let pathD = `M ${points[0].x} ${points[0].y}`;
      for (let i = 1; i < points.length; i++) {
        pathD += ` L ${points[i].x} ${points[i].y}`;
      }

      svgContent += `<path d="${pathD}" fill="none" stroke="${arrowColor}" stroke-width="2"
        stroke-dasharray="6,4" opacity="0.9"/>`;

      // Dots at each effective baseline point
      points.forEach((p, i) => {
        svgContent += `<circle cx="${p.x}" cy="${p.y}" r="4" fill="${arrowColor}" opacity="0.9"/>`;
      });

      // Label for the ratchet line
      svgContent += `<text x="${points[4].x + 10}" y="${points[4].y - 6}" fill="${arrowColor}"
        font-family="Inter, sans-serif" font-size="11" font-weight="700" letter-spacing="0.5">EFFECTIVE FLOOR</text>`;

      svg.innerHTML = svgContent;
    });
  });
</script>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=1200">
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 600px;
    overflow: hidden;
    background: #111111;
    font-family: 'Inter', system-ui, -apple-system, sans-serif;
    color: #FFFFFF;
  }

  .container {
    width: 1200px;
    height: 600px;
    display: flex;
    flex-direction: column;
    padding: 44px 56px 36px 56px;
    position: relative;
  }

  /* Header */
  .header {
    display: flex;
    align-items: flex-start;
    justify-content: space-between;
    margin-bottom: 36px;
  }

  .rubric-label {
    color: #D4532B;
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 3px;
    text-transform: uppercase;
  }

  .title {
    font-size: 26px;
    font-weight: 900;
    color: #FFFFFF;
    margin-top: 8px;
    letter-spacing: -0.5px;
  }

  .subtitle {
    font-size: 13px;
    color: #666666;
    margin-top: 4px;
    font-weight: 400;
  }

  /* Main content: two columns + divider */
  .main {
    flex: 1;
    display: flex;
    gap: 0;
    align-items: stretch;
  }

  /* Left panel */
  .panel {
    flex: 1;
    display: flex;
    flex-direction: column;
  }

  .panel-header {
    display: flex;
    align-items: baseline;
    gap: 10px;
    margin-bottom: 22px;
  }

  .panel-title {
    font-size: 13px;
    font-weight: 700;
    color: #666666;
    text-transform: uppercase;
    letter-spacing: 2px;
  }

  .panel-score {
    font-size: 28px;
    font-weight: 900;
    color: #FFFFFF;
  }

  .panel-score.orange {
    color: #D4532B;
  }

  /* Divider */
  .divider {
    width: 1px;
    background: #D4532B;
    margin: 0 44px;
    flex-shrink: 0;
  }

  /* Bar items */
  .bar-list {
    display: flex;
    flex-direction: column;
    gap: 13px;
    flex: 1;
  }

  .bar-item {
    display: flex;
    flex-direction: column;
    gap: 5px;
  }

  .bar-meta {
    display: flex;
    align-items: baseline;
    justify-content: space-between;
  }

  .bar-label {
    font-size: 12px;
    font-weight: 600;
    color: #AAAAAA;
    letter-spacing: 0.3px;
  }

  .bar-weight {
    font-size: 22px;
    font-weight: 900;
    color: #FFFFFF;
    line-height: 1;
  }

  .bar-weight.orange {
    color: #D4532B;
    font-size: 28px;
  }

  .bar-track {
    width: 100%;
    height: 6px;
    background: #222222;
    position: relative;
  }

  .bar-fill {
    height: 100%;
    background: #CCCCCC;
    transition: none;
  }

  .bar-fill.orange {
    background: linear-gradient(90deg, #D4532B 0%, #FF7A4D 100%);
  }

  /* Right panel: larger items */
  .bar-item.large .bar-label {
    font-size: 13px;
    color: #AAAAAA;
  }

  .bar-item.large .bar-track {
    height: 8px;
  }

  /* Bottom total */
  .bottom {
    margin-top: 28px;
    display: flex;
    align-items: center;
    justify-content: center;
    gap: 16px;
    border-top: 1px solid #222222;
    padding-top: 18px;
  }

  .total-label {
    font-size: 13px;
    font-weight: 700;
    color: #666666;
    letter-spacing: 3px;
    text-transform: uppercase;
  }

  .total-score {
    font-size: 38px;
    font-weight: 900;
    color: #FFFFFF;
    line-height: 1;
  }

  .total-unit {
    font-size: 13px;
    color: #666666;
    font-weight: 600;
    letter-spacing: 2px;
  }

  .dot {
    width: 4px;
    height: 4px;
    background: #D4532B;
    display: inline-block;
    margin: 0 6px 2px 6px;
    vertical-align: middle;
  }
</style>
</head>
<body>
<div class="container">

  <!-- Header -->
  <div class="header">
    <div>
      <div class="rubric-label">EVALUATION RUBRIC</div>
      <div class="title">darwin.skill — 8-Dimension Evaluation Rubric</div>
      <div class="subtitle">Automated quality scoring framework for Claude skill optimization</div>
    </div>
  </div>

  <!-- Main -->
  <div class="main">

    <!-- Left: Structure -->
    <div class="panel">
      <div class="panel-header">
        <div class="panel-title">Structure</div>
        <div class="panel-score">60 pts</div>
      </div>
      <div class="bar-list">

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Frontmatter Quality</span>
            <span class="bar-weight">8</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(8/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Workflow Clarity</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: 100%"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Edge Case Coverage</span>
            <span class="bar-weight">10</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(10/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Checkpoint Design</span>
            <span class="bar-weight">7</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(7/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Instruction Specificity</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: 100%"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Resource Integration</span>
            <span class="bar-weight">5</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(5/15*100%)"></div>
          </div>
        </div>

      </div>
    </div>

    <!-- Vertical divider -->
    <div class="divider"></div>

    <!-- Right: Effect -->
    <div class="panel">
      <div class="panel-header">
        <div class="panel-title">Effectiveness</div>
        <div class="panel-score orange">40 pts</div>
      </div>
      <div class="bar-list">

        <div class="bar-item large">
          <div class="bar-meta">
            <span class="bar-label">Overall Architecture</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(15/25*100%)"></div>
          </div>
        </div>

        <div class="bar-item large" style="margin-top: 12px;">
          <div class="bar-meta" style="margin-bottom: 2px;">
            <span class="bar-label" style="color:#D4532B; font-size:14px; font-weight:700; letter-spacing:0.5px;">Live Test Performance</span>
            <span class="bar-weight orange">25</span>
          </div>
          <!-- Accent line above bar -->
          <div style="width:100%; height:1px; background:#D4532B; opacity:0.25; margin-bottom:6px;"></div>
          <div class="bar-track" style="height:14px; background:#1A0E0A;">
            <div class="bar-fill orange" style="width:100%; height:100%;"></div>
          </div>
          <div style="font-size:11px; color:#D4532B; margin-top:5px; font-weight:600; letter-spacing:1px;">HIGHEST WEIGHT</div>
        </div>

      </div>
    </div>

  </div>

  <!-- Bottom total -->
  <div class="bottom">
    <span class="total-label">TOTAL</span>
    <span class="dot"></span>
    <span class="total-score">100</span>
    <span class="total-unit">PTS</span>
  </div>

</div>
</body>
</html>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=1200">
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700;900&display=swap" rel="stylesheet">
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    width: 1200px;
    height: 600px;
    overflow: hidden;
    background: #111111;
    font-family: 'Inter', system-ui, -apple-system, sans-serif;
    color: #FFFFFF;
  }

  .container {
    width: 1200px;
    height: 600px;
    display: flex;
    flex-direction: column;
    padding: 44px 56px 36px 56px;
    position: relative;
  }

  /* Header */
  .header {
    display: flex;
    align-items: flex-start;
    justify-content: space-between;
    margin-bottom: 36px;
  }

  .rubric-label {
    color: #D4532B;
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 3px;
    text-transform: uppercase;
  }

  .title {
    font-size: 26px;
    font-weight: 900;
    color: #FFFFFF;
    margin-top: 8px;
    letter-spacing: -0.5px;
  }

  .subtitle {
    font-size: 13px;
    color: #666666;
    margin-top: 4px;
    font-weight: 400;
  }

  /* Main content: two columns + divider */
  .main {
    flex: 1;
    display: flex;
    gap: 0;
    align-items: stretch;
  }

  /* Left panel */
  .panel {
    flex: 1;
    display: flex;
    flex-direction: column;
  }

  .panel-header {
    display: flex;
    align-items: baseline;
    gap: 10px;
    margin-bottom: 22px;
  }

  .panel-title {
    font-size: 13px;
    font-weight: 700;
    color: #666666;
    text-transform: uppercase;
    letter-spacing: 2px;
  }

  .panel-score {
    font-size: 28px;
    font-weight: 900;
    color: #FFFFFF;
  }

  .panel-score.orange {
    color: #D4532B;
  }

  /* Divider */
  .divider {
    width: 1px;
    background: #D4532B;
    margin: 0 44px;
    flex-shrink: 0;
  }

  /* Bar items */
  .bar-list {
    display: flex;
    flex-direction: column;
    gap: 13px;
    flex: 1;
  }

  .bar-item {
    display: flex;
    flex-direction: column;
    gap: 5px;
  }

  .bar-meta {
    display: flex;
    align-items: baseline;
    justify-content: space-between;
  }

  .bar-label {
    font-size: 12px;
    font-weight: 600;
    color: #AAAAAA;
    letter-spacing: 0.3px;
  }

  .bar-weight {
    font-size: 22px;
    font-weight: 900;
    color: #FFFFFF;
    line-height: 1;
  }

  .bar-weight.orange {
    color: #D4532B;
    font-size: 28px;
  }

  .bar-track {
    width: 100%;
    height: 6px;
    background: #222222;
    position: relative;
  }

  .bar-fill {
    height: 100%;
    background: #CCCCCC;
    transition: none;
  }

  .bar-fill.orange {
    background: linear-gradient(90deg, #D4532B 0%, #FF7A4D 100%);
  }

  /* Right panel: larger items */
  .bar-item.large .bar-label {
    font-size: 13px;
    color: #AAAAAA;
  }

  .bar-item.large .bar-track {
    height: 8px;
  }

  /* Bottom total */
  .bottom {
    margin-top: 28px;
    display: flex;
    align-items: center;
    justify-content: center;
    gap: 16px;
    border-top: 1px solid #222222;
    padding-top: 18px;
  }

  .total-label {
    font-size: 13px;
    font-weight: 700;
    color: #666666;
    letter-spacing: 3px;
    text-transform: uppercase;
  }

  .total-score {
    font-size: 38px;
    font-weight: 900;
    color: #FFFFFF;
    line-height: 1;
  }

  .total-unit {
    font-size: 13px;
    color: #666666;
    font-weight: 600;
    letter-spacing: 2px;
  }

  .dot {
    width: 4px;
    height: 4px;
    background: #D4532B;
    display: inline-block;
    margin: 0 6px 2px 6px;
    vertical-align: middle;
  }
</style>
</head>
<body>
<div class="container">

  <!-- Header -->
  <div class="header">
    <div>
      <div class="rubric-label">EVALUATION RUBRIC</div>
      <div class="title">达尔文.skill — 8 维度评估体系</div>
      <div class="subtitle">Automated quality scoring framework for Claude skill optimization</div>
    </div>
  </div>

  <!-- Main -->
  <div class="main">

    <!-- Left: Structure -->
    <div class="panel">
      <div class="panel-header">
        <div class="panel-title">结构维度</div>
        <div class="panel-score">60分</div>
      </div>
      <div class="bar-list">

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">Frontmatter 质量</span>
            <span class="bar-weight">8</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(8/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">工作流清晰度</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: 100%"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">边界条件覆盖</span>
            <span class="bar-weight">10</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(10/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">检查点设计</span>
            <span class="bar-weight">7</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(7/15*100%)"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">指令具体性</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: 100%"></div>
          </div>
        </div>

        <div class="bar-item">
          <div class="bar-meta">
            <span class="bar-label">资源整合度</span>
            <span class="bar-weight">5</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(5/15*100%)"></div>
          </div>
        </div>

      </div>
    </div>

    <!-- Vertical divider -->
    <div class="divider"></div>

    <!-- Right: Effect -->
    <div class="panel">
      <div class="panel-header">
        <div class="panel-title">效果维度</div>
        <div class="panel-score orange">40分</div>
      </div>
      <div class="bar-list">

        <div class="bar-item large">
          <div class="bar-meta">
            <span class="bar-label">整体架构</span>
            <span class="bar-weight">15</span>
          </div>
          <div class="bar-track">
            <div class="bar-fill" style="width: calc(15/25*100%)"></div>
          </div>
        </div>

        <div class="bar-item large" style="margin-top: 12px;">
          <div class="bar-meta" style="margin-bottom: 2px;">
            <span class="bar-label" style="color:#D4532B; font-size:14px; font-weight:700; letter-spacing:0.5px;">实测表现</span>
            <span class="bar-weight orange">25</span>
          </div>
          <!-- Accent line above bar -->
          <div style="width:100%; height:1px; background:#D4532B; opacity:0.25; margin-bottom:6px;"></div>
          <div class="bar-track" style="height:14px; background:#1A0E0A;">
            <div class="bar-fill orange" style="width:100%; height:100%;"></div>
          </div>
          <div style="font-size:11px; color:#D4532B; margin-top:5px; font-weight:600; letter-spacing:1px;">HIGHEST WEIGHT</div>
        </div>

      </div>
    </div>

  </div>

  <!-- Bottom total -->
  <div class="bottom">
    <span class="total-label">TOTAL</span>
    <span class="dot"></span>
    <span class="total-score">100</span>
    <span class="total-unit">PTS</span>
  </div>

</div>
</body>
</html>

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>自主技能优化系统</title>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800;900&display=swap" rel="stylesheet">
<style>
  :root {
    --accent: #D4532B;
    --black: #111111;
    --dark: #1a1a1a;
    --mid: #666666;
    --light: #999999;
    --border: #d0d0d0;
    --bg: #fafafa;
    --white: #ffffff;
    --col: calc((100% - 11 * 24px) / 12);
  }

  * { margin: 0; padding: 0; box-sizing: border-box; }

  body {
    font-family: 'Inter', -apple-system, sans-serif;
    background: var(--bg);
    color: var(--black);
    font-size: 15px;
    line-height: 1.6;
    -webkit-font-smoothing: antialiased;
  }

  .container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 0 48px;
  }

  /* ═══════ HERO ═══════ */
  .hero {
    padding: 120px 0 80px;
    border-bottom: 1px solid var(--black);
  }

  .hero-label {
    font-size: 11px;
    font-weight: 600;
    letter-spacing: 3px;
    text-transform: uppercase;
    color: var(--accent);
    margin-bottom: 32px;
  }

  .hero h1 {
    font-size: 88px;
    font-weight: 900;
    line-height: 0.95;
    letter-spacing: -3px;
    margin-bottom: 40px;
    max-width: 900px;
  }

  .hero-subtitle {
    font-size: 20px;
    font-weight: 400;
    color: var(--mid);
    line-height: 1.5;
    max-width: 640px;
    margin-bottom: 56px;
  }

  .hero-subtitle strong {
    color: var(--black);
    font-weight: 600;
  }

  .hero-quote {
    border-left: 3px solid var(--accent);
    padding: 20px 0 20px 24px;
    max-width: 600px;
  }

  .hero-quote p {
    font-size: 16px;
    font-weight: 400;
    font-style: italic;
    color: var(--dark);
    line-height: 1.7;
  }

  .hero-quote cite {
    display: block;
    margin-top: 12px;
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 1px;
    text-transform: uppercase;
    font-style: normal;
    color: var(--light);
  }

  /* ═══════ SECTION HEADERS ═══════ */
  .section {
    padding: 80px 0;
    border-bottom: 1px solid var(--border);
  }

  .section:last-child {
    border-bottom: none;
  }

  .section-num {
    font-size: 12px;
    font-weight: 700;
    letter-spacing: 2px;
    color: var(--accent);
    margin-bottom: 16px;
    font-variant-numeric: tabular-nums;
  }

  .section-title {
    font-size: 48px;
    font-weight: 800;
    line-height: 1.05;
    letter-spacing: -1.5px;
    margin-bottom: 16px;
  }

  .section-lead {
    font-size: 17px;
    color: var(--mid);
    max-width: 560px;
    line-height: 1.6;
    margin-bottom: 48px;
  }

  /* ═══════ PRINCIPLES ═══════ */
  .principles-grid {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 0;
  }

  .principle {
    padding: 32px 32px 32px 0;
    border-top: 1px solid var(--border);
  }

  .principle:nth-child(even) {
    padding-left: 32px;
    border-left: 1px solid var(--border);
  }

  .principle:nth-child(1),
  .principle:nth-child(2) {
    border-top: 1px solid var(--black);
  }

  .principle-num {
    font-size: 36px;
    font-weight: 800;
    color: var(--accent);
    margin-bottom: 12px;
    line-height: 1;
  }

  .principle h3 {
    font-size: 18px;
    font-weight: 700;
    margin-bottom: 8px;
    letter-spacing: -0.3px;
  }

  .principle p {
    font-size: 14px;
    color: var(--mid);
    line-height: 1.6;
  }

  .principle--full {
    grid-column: 1 / -1;
    padding-left: 0;
    border-left: none;
  }

  /* ═══════ RUBRIC ═══════ */
  .rubric-header {
    display: flex;
    gap: 48px;
    margin-bottom: 48px;
  }

  .rubric-stat {
    display: flex;
    align-items: baseline;
    gap: 12px;
  }

  .rubric-stat-num {
    font-size: 64px;
    font-weight: 900;
    line-height: 1;
    letter-spacing: -2px;
  }

  .rubric-stat-num--accent {
    color: var(--accent);
  }

  .rubric-stat-label {
    font-size: 13px;
    font-weight: 600;
    text-transform: uppercase;
    letter-spacing: 1.5px;
    color: var(--mid);
  }

  .rubric-table {
    width: 100%;
    border-collapse: collapse;
    margin-bottom: 40px;
  }

  .rubric-table caption {
    text-align: left;
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 2.5px;
    text-transform: uppercase;
    color: var(--light);
    padding-bottom: 16px;
  }

  .rubric-table th {
    text-align: left;
    font-size: 11px;
    font-weight: 600;
    letter-spacing: 1.5px;
    text-transform: uppercase;
    color: var(--light);
    padding: 12px 16px 12px 0;
    border-bottom: 2px solid var(--black);
  }

  .rubric-table td {
    padding: 14px 16px 14px 0;
    border-bottom: 1px solid var(--border);
    font-size: 14px;
    vertical-align: top;
  }

  .rubric-table tr:last-child td {
    border-bottom: none;
  }

  .rubric-table .dim-num {
    font-weight: 700;
    color: var(--accent);
    font-variant-numeric: tabular-nums;
    width: 36px;
  }

  .rubric-table .dim-name {
    font-weight: 600;
    white-space: nowrap;
  }

  .rubric-table .dim-weight {
    font-weight: 800;
    font-size: 20px;
    font-variant-numeric: tabular-nums;
    text-align: center;
    width: 60px;
    color: var(--dark);
  }

  .rubric-table .dim-desc {
    color: var(--mid);
    line-height: 1.5;
  }

  /* ═══════ PHASES ═══════ */
  .phases {
    display: flex;
    flex-direction: column;
    gap: 0;
  }

  .phase {
    display: grid;
    grid-template-columns: 160px 1fr;
    gap: 40px;
    padding: 40px 0;
    border-top: 1px solid var(--border);
  }

  .phase:first-child {
    border-top: 1px solid var(--black);
  }

  .phase-id {
    font-size: 48px;
    font-weight: 900;
    color: var(--accent);
    line-height: 1;
    letter-spacing: -1px;
  }

  .phase-id span {
    display: block;
    font-size: 11px;
    font-weight: 600;
    letter-spacing: 2px;
    text-transform: uppercase;
    color: var(--light);
    margin-top: 8px;
  }

  .phase-body h3 {
    font-size: 22px;
    font-weight: 700;
    margin-bottom: 12px;
    letter-spacing: -0.3px;
  }

  .phase-body p {
    font-size: 14px;
    color: var(--mid);
    line-height: 1.6;
    margin-bottom: 16px;
    max-width: 560px;
  }

  .phase-steps {
    list-style: none;
    counter-reset: step;
  }

  .phase-steps li {
    counter-increment: step;
    padding: 8px 0 8px 32px;
    position: relative;
    font-size: 14px;
    line-height: 1.5;
    color: var(--dark);
  }

  .phase-steps li::before {
    content: counter(step);
    position: absolute;
    left: 0;
    font-size: 11px;
    font-weight: 700;
    color: var(--accent);
    width: 20px;
    height: 20px;
    display: flex;
    align-items: center;
    justify-content: center;
    top: 9px;
  }

  /* ═══════ RATCHET ═══════ */
  .ratchet-viz {
    display: flex;
    align-items: flex-end;
    gap: 0;
    padding: 48px 0;
    position: relative;
  }

  .ratchet-viz::before {
    content: '';
    position: absolute;
    bottom: 48px;
    left: 0;
    right: 0;
    height: 1px;
    background: var(--border);
  }

  .ratchet-step {
    flex: 1;
    display: flex;
    flex-direction: column;
    align-items: center;
    position: relative;
  }

  .ratchet-bar {
    width: 80px;
    background: var(--black);
    position: relative;
    z-index: 1;
  }

  .ratchet-bar--revert {
    background: none;
    border: 2px solid var(--border);
  }

  .ratchet-score {
    font-size: 36px;
    font-weight: 900;
    margin-bottom: 8px;
    letter-spacing: -1px;
    line-height: 1;
  }

  .ratchet-score--revert {
    color: var(--light);
    text-decoration: line-through;
    text-decoration-color: var(--accent);
    text-decoration-thickness: 2px;
  }

  .ratchet-label {
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 1.5px;
    text-transform: uppercase;
    margin-top: 12px;
    padding: 4px 10px;
  }

  .ratchet-label--keep {
    background: var(--black);
    color: var(--white);
  }

  .ratchet-label--revert {
    background: none;
    border: 1px solid var(--accent);
    color: var(--accent);
  }

  .ratchet-label--baseline {
    background: var(--accent);
    color: var(--white);
  }

  .ratchet-arrow {
    position: absolute;
    top: 50%;
    right: -12px;
    width: 24px;
    height: 2px;
    background: var(--border);
    z-index: 2;
  }

  .ratchet-arrow::after {
    content: '';
    position: absolute;
    right: -1px;
    top: -4px;
    border: solid var(--border);
    border-width: 0 2px 2px 0;
    padding: 3px;
    transform: rotate(-45deg);
  }

  .ratchet-round {
    font-size: 12px;
    color: var(--light);
    margin-top: 8px;
    font-weight: 500;
  }

  /* ═══════ COMPARISON ═══════ */
  .comparison {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 0;
  }

  .comparison-col {
    padding: 40px;
    border: 1px solid var(--border);
  }

  .comparison-col:first-child {
    border-right: none;
  }

  .comparison-col--highlight {
    background: var(--black);
    color: var(--white);
    border-color: var(--black);
  }

  .comparison-tag {
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 2px;
    text-transform: uppercase;
    margin-bottom: 16px;
  }

  .comparison-col:first-child .comparison-tag {
    color: var(--light);
  }

  .comparison-col--highlight .comparison-tag {
    color: var(--accent);
  }

  .comparison-col h3 {
    font-size: 24px;
    font-weight: 800;
    margin-bottom: 20px;
    letter-spacing: -0.5px;
  }

  .comparison-list {
    list-style: none;
  }

  .comparison-list li {
    padding: 10px 0;
    font-size: 14px;
    line-height: 1.5;
    border-bottom: 1px solid;
  }

  .comparison-col:first-child .comparison-list li {
    border-color: var(--border);
    color: var(--mid);
  }

  .comparison-col--highlight .comparison-list li {
    border-color: #333;
    color: #ccc;
  }

  .comparison-list li:last-child {
    border-bottom: none;
  }

  .comparison-list li strong {
    color: var(--black);
  }

  .comparison-col--highlight .comparison-list li strong {
    color: var(--white);
  }

  .check-icon {
    display: inline-block;
    width: 16px;
    height: 16px;
    margin-right: 8px;
    vertical-align: middle;
    position: relative;
    top: -1px;
  }

  /* ═══════ MAPPING TABLE ═══════ */
  .mapping-table {
    width: 100%;
    border-collapse: collapse;
  }

  .mapping-table th {
    text-align: left;
    font-size: 11px;
    font-weight: 700;
    letter-spacing: 2px;
    text-transform: uppercase;
    padding: 16px 24px 16px 0;
    border-bottom: 2px solid var(--black);
  }

  .mapping-table th:first-child {
    color: var(--light);
  }

  .mapping-table th:nth-child(2) {
    color: var(--accent);
  }

  .mapping-table th:last-child {
    color: var(--light);
  }

  .mapping-table td {
    padding: 16px 24px 16px 0;
    border-bottom: 1px solid var(--border);
    font-size: 14px;
    vertical-align: top;
  }

  .mapping-table td:first-child {
    font-weight: 600;
    color: var(--dark);
    white-space: nowrap;
  }

  .mapping-table td:nth-child(2) {
    font-weight: 600;
    color: var(--black);
  }

  .mapping-table td:last-child {
    color: var(--mid);
    line-height: 1.5;
  }

  .mapping-arrow {
    display: inline-block;
    color: var(--accent);
    font-weight: 400;
    margin: 0 4px;
  }

  /* ═══════ FOOTER ═══════ */
  .footer {
    padding: 48px 0;
    border-top: 1px solid var(--black);
    display: flex;
    justify-content: space-between;
    align-items: center;
  }

  .footer-left {
    font-size: 12px;
    font-weight: 600;
    letter-spacing: 1px;
    text-transform: uppercase;
    color: var(--light);
  }

  .footer-right {
    font-size: 12px;
    color: var(--light);
  }

  /* ═══════ RESPONSIVE ═══════ */
  @media (max-width: 768px) {
    .container { padding: 0 24px; }
    .hero { padding: 64px 0 48px; }
    .hero h1 { font-size: 48px; letter-spacing: -1.5px; }
    .hero-subtitle { font-size: 17px; }
    .section { padding: 48px 0; }
    .section-title { font-size: 32px; }
    .principles-grid { grid-template-columns: 1fr; }
    .principle:nth-child(even) { padding-left: 0; border-left: none; }
    .principle:nth-child(2) { border-top: 1px solid var(--border); }
    .phase { grid-template-columns: 1fr; gap: 16px; }
    .comparison { grid-template-columns: 1fr; }
    .comparison-col:first-child { border-right: 1px solid var(--border); border-bottom: none; }
    .ratchet-viz { flex-wrap: wrap; gap: 24px; }
    .ratchet-step { flex: none; width: calc(33% - 16px); }
    .rubric-stat-num { font-size: 48px; }
    .mapping-table td:first-child { white-space: normal; }
  }
</style>
</head>
<body>

<!-- ═══════════════════════════ HERO ═══════════════════════════ -->
<div class="container">
  <section class="hero">
    <div class="hero-label">自主技能优化系统</div>
    <h1>Auto Skill<br>Optimizer</h1>
    <p class="hero-subtitle">
      <strong>评估</strong> &rarr; <strong>改进</strong> &rarr; <strong>实测验证</strong> &rarr; <strong>人类确认</strong> &rarr; <strong>保留或回滚</strong>
    </p>
    <div class="hero-quote">
      <p>「autoresearch 的核心想法很简单：让系统自主运行实验，评估结果，只保留有效的改进。一个只能向前转的棘轮。」</p>
      <cite>Andrej Karpathy &mdash; 谈自主实验循环</cite>
    </div>
  </section>
</div>

<!-- ═══════════════════════════ 01 PRINCIPLES ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">01</div>
    <h2 class="section-title">核心原则</h2>
    <p class="section-lead">五条规则，防止优化器偏移方向、自我刷分或引入退化。</p>

    <div class="principles-grid">
      <div class="principle">
        <div class="principle-num">01</div>
        <h3>单一可编辑资产</h3>
        <p>每轮优化只针对一个 SKILL.md 文件。一次修改，一次测量，一次决策。不做跨文件编辑，避免归因模糊。</p>
      </div>
      <div class="principle">
        <div class="principle-num">02</div>
        <h3>双重评估</h3>
        <p>静态结构分析捕捉格式和完整性问题。实测执行捕捉行为退化。两者缺一不可。</p>
      </div>
      <div class="principle">
        <div class="principle-num">03</div>
        <h3>棘轮机制</h3>
        <p>提升总分的改进被 commit。降低分数的修改自动 revert。分数只能上升或持平，永远不会下降。</p>
      </div>
      <div class="principle">
        <div class="principle-num">04</div>
        <h3>独立评分</h3>
        <p>编辑 Skill 的 Agent 永远不为自己打分。由独立的子 Agent 评估输出质量，防止自我表扬偏差。</p>
      </div>
      <div class="principle principle--full">
        <div class="principle-num">05</div>
        <h3>人在回路</h3>
        <p>每个 Skill 的优化循环完成后，系统暂停。向人类展示 diff 摘要、分数变化和测试输出对比。没有明确确认，任何改动都不会生效。</p>
      </div>
    </div>
  </section>
</div>

<!-- ═══════════════════════════ 02 RUBRIC ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">02</div>
    <h2 class="section-title">8维度<br>评估体系</h2>
    <p class="section-lead">100分评估体系。结构维度捕捉你能看到的问题，效果维度捕捉只有运行时才能感知的问题。</p>

    <div class="rubric-header">
      <div class="rubric-stat">
        <div class="rubric-stat-num">60</div>
        <div class="rubric-stat-label">结构<br>分值</div>
      </div>
      <div class="rubric-stat">
        <div class="rubric-stat-num rubric-stat-num--accent">40</div>
        <div class="rubric-stat-label">效果<br>分值</div>
      </div>
    </div>

    <table class="rubric-table">
      <caption>结构维度 &mdash; 静态分析</caption>
      <thead>
        <tr>
          <th style="width:36px">#</th>
          <th style="width:180px">维度</th>
          <th style="width:60px">权重</th>
          <th>评分标准</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td class="dim-num">1</td>
          <td class="dim-name">Frontmatter质量</td>
          <td class="dim-weight">8</td>
          <td class="dim-desc">名称正确，描述包含功能/触发条件/使用场景，不超过1024字符</td>
        </tr>
        <tr>
          <td class="dim-num">2</td>
          <td class="dim-name">工作流清晰度</td>
          <td class="dim-weight">15</td>
          <td class="dim-desc">步骤有编号、可执行，每步都有明确的输入/输出</td>
        </tr>
        <tr>
          <td class="dim-num">3</td>
          <td class="dim-name">边界条件覆盖</td>
          <td class="dim-weight">10</td>
          <td class="dim-desc">错误处理、降级方案、常见故障恢复</td>
        </tr>
        <tr>
          <td class="dim-num">4</td>
          <td class="dim-name">检查点设计</td>
          <td class="dim-weight">7</td>
          <td class="dim-desc">关键决策前需用户确认，防止自主失控</td>
        </tr>
        <tr>
          <td class="dim-num">5</td>
          <td class="dim-name">指令具体性</td>
          <td class="dim-weight">15</td>
          <td class="dim-desc">无歧义，具体的参数/格式/示例，可直接执行</td>
        </tr>
        <tr>
          <td class="dim-num">6</td>
          <td class="dim-name">资源整合度</td>
          <td class="dim-weight">5</td>
          <td class="dim-desc">所有引用的脚本/资产路径存在且可访问</td>
        </tr>
      </tbody>
    </table>

    <table class="rubric-table">
      <caption>效果维度 &mdash; 需要实测</caption>
      <thead>
        <tr>
          <th style="width:36px">#</th>
          <th style="width:180px">维度</th>
          <th style="width:60px">权重</th>
          <th>评分标准</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td class="dim-num">7</td>
          <td class="dim-name">整体架构</td>
          <td class="dim-weight">15</td>
          <td class="dim-desc">层次清晰，无冗余或遗漏，符合生态系统约定</td>
        </tr>
        <tr>
          <td class="dim-num">8</td>
          <td class="dim-name">实测表现</td>
          <td class="dim-weight">25</td>
          <td class="dim-desc">运行2-3个测试提示词，对比启用 Skill 和 baseline 的输出质量</td>
        </tr>
      </tbody>
    </table>
  </section>
</div>

<!-- ═══════════════════════════ 03 PHASES ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">03</div>
    <h2 class="section-title">优化循环</h2>
    <p class="section-lead">从初始化到最终报告的五个阶段。系统在每个阶段内自主运行，但在阶段之间暂停等待人类审查。</p>

    <div class="phases">
      <div class="phase">
        <div class="phase-id">
          0
          <span>初始化</span>
        </div>
        <div class="phase-body">
          <h3>范围与分支设置</h3>
          <p>确定优化范围，创建版本控制基础设施，加载历史记录。</p>
          <ol class="phase-steps">
            <li>确认范围：全部 Skill 还是用户指定子集</li>
            <li>扫描 .claude/skills/*/SKILL.md 获取目标列表</li>
            <li>创建 git 分支：auto-optimize/YYYYMMDD-HHMM</li>
            <li>初始化或加载 results.tsv 用于历史追踪</li>
          </ol>
        </div>
      </div>

      <div class="phase">
        <div class="phase-id">
          0.5
          <span>设计</span>
        </div>
        <div class="phase-body">
          <h3>测试提示词工程</h3>
          <p>在任何评分之前，先设计用于衡量效果的测试提示词。没有好的测试，优化器就是盲飞。</p>
          <ol class="phase-steps">
            <li>阅读每个 SKILL.md，理解其声明的能力</li>
            <li>为每个 Skill 设计2-3个提示词：一个正常路径，一个模糊场景</li>
            <li>保存到每个 Skill 目录下的 test-prompts.json</li>
            <li>在继续之前，将所有测试提示词提交人类审批</li>
          </ol>
        </div>
      </div>

      <div class="phase">
        <div class="phase-id">
          1
          <span>基线</span>
        </div>
        <div class="phase-body">
          <h3>全维度评分</h3>
          <p>为每个 Skill 建立起始分数。结构评分由主 Agent 完成，效果评分由独立子 Agent 完成。</p>
          <ol class="phase-steps">
            <li>阅读 SKILL.md，为维度1-7评分并附理由</li>
            <li>启动子 Agent：分别在启用和未启用 Skill 的情况下运行测试提示词</li>
            <li>对比输出，为维度8评分（如子 Agent 不可用则标记 dry_run）</li>
            <li>计算加权总分，记录到 results.tsv</li>
            <li>展示评分卡，暂停等待人类确认</li>
          </ol>
        </div>
      </div>

      <div class="phase">
        <div class="phase-id">
          2
          <span>优化</span>
        </div>
        <div class="phase-body">
          <h3>Hill-Climbing 循环</h3>
          <p>按分数从低到高处理 Skill。每轮：诊断最弱维度，提出一个针对性修复，执行，重新评分，做出决定。</p>
          <ol class="phase-steps">
            <li>找出该 Skill 得分最低的维度</li>
            <li>生成一项具体改进（改什么，为什么改，预期分数变化）</li>
            <li>编辑 SKILL.md，用结构化消息 git commit</li>
            <li>重新评分：结构由主 Agent，效果由独立子 Agent</li>
            <li>新分 > 旧分：保留。否则：git revert，进入下一个 Skill</li>
            <li>每个 Skill 完成后：展示 diff + 分数变化，等待人类确认</li>
          </ol>
        </div>
      </div>

      <div class="phase">
        <div class="phase-id">
          3
          <span>报告</span>
        </div>
        <div class="phase-body">
          <h3>总结与指标</h3>
          <p>将所有结果汇总为最终优化报告，包含优化前后分数、实验次数和关键改进。</p>
          <ol class="phase-steps">
            <li>统计总实验次数、保留次数、回滚次数和测试模式</li>
            <li>生成每个 Skill 的优化前后分数对比表</li>
            <li>列出影响最大的改进及其对应维度</li>
            <li>归档 results.tsv 供未来 baseline 参考</li>
          </ol>
        </div>
      </div>
    </div>
  </section>
</div>

<!-- ═══════════════════════════ 04 RATCHET ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">04</div>
    <h2 class="section-title">棘轮机制</h2>
    <p class="section-lead">分数只能上升。每轮要么改进 Skill，要么干净地回滚。不会随时间积累局部退化。</p>

    <div class="ratchet-viz">
      <div class="ratchet-step">
        <div class="ratchet-score">72</div>
        <div style="height:144px" class="ratchet-bar"></div>
        <div class="ratchet-label ratchet-label--baseline">基线</div>
        <div class="ratchet-round">轮次 0</div>
        <div class="ratchet-arrow"></div>
      </div>
      <div class="ratchet-step">
        <div class="ratchet-score">78</div>
        <div style="height:156px" class="ratchet-bar"></div>
        <div class="ratchet-label ratchet-label--keep">保留</div>
        <div class="ratchet-round">轮次 1</div>
        <div class="ratchet-arrow"></div>
      </div>
      <div class="ratchet-step">
        <div class="ratchet-score ratchet-score--revert">75</div>
        <div style="height:150px" class="ratchet-bar ratchet-bar--revert"></div>
        <div class="ratchet-label ratchet-label--revert">回滚</div>
        <div class="ratchet-round">轮次 2</div>
        <div class="ratchet-arrow"></div>
      </div>
      <div class="ratchet-step">
        <div class="ratchet-score">84</div>
        <div style="height:168px" class="ratchet-bar"></div>
        <div class="ratchet-label ratchet-label--keep">Keep</div>
        <div class="ratchet-round">轮次 3</div>
        <div class="ratchet-arrow"></div>
      </div>
      <div class="ratchet-step">
        <div class="ratchet-score">87</div>
        <div style="height:174px" class="ratchet-bar"></div>
        <div class="ratchet-label ratchet-label--keep">Keep</div>
        <div class="ratchet-round">轮次 4</div>
      </div>
    </div>
  </section>
</div>

<!-- ═══════════════════════════ 05 COMPARISON ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">05</div>
    <h2 class="section-title">为什么需要<br>双重评估</h2>
    <p class="section-lead">单看结构无法判断 Skill 是否真正好用。单看效果无法判断它为何失败。</p>

    <div class="comparison">
      <div class="comparison-col">
        <div class="comparison-tag">传统方法</div>
        <h3>纯结构审查</h3>
        <ul class="comparison-list">
          <li>检查 frontmatter 是否存在且格式正确</li>
          <li>验证步骤是否有编号和描述</li>
          <li>确认文件路径和引用是否有效</li>
          <li>无法检测 Skill 是否<strong>真正提升了</strong>输出质量</li>
          <li>无法检测<strong>看似正确</strong>实则产生差结果的误导性指令</li>
          <li>无法检测<strong>弊大于利</strong>的过度约束</li>
        </ul>
      </div>
      <div class="comparison-col comparison-col--highlight">
        <div class="comparison-tag">Auto Skill Optimizer</div>
        <h3>双重评估</h3>
        <ul class="comparison-list">
          <li><strong>结构评分</strong>捕捉格式、完整性和可读性问题</li>
          <li><strong>实测执行</strong>揭示真实场景下的行为影响</li>
          <li><strong>基线对比</strong>衡量 Skill 是增值还是减值</li>
          <li><strong>独立子 Agent</strong>防止自我表扬的评分偏差</li>
          <li><strong>测试提示词设计</strong>确保评估针对真实用户场景</li>
          <li><strong>Dry-run 降级</strong>在实测不可用时提供覆盖</li>
        </ul>
      </div>
    </div>
  </section>
</div>

<!-- ═══════════════════════════ 06 MAPPING ═══════════════════════════ -->
<div class="container">
  <section class="section">
    <div class="section-num">06</div>
    <h2 class="section-title">概念映射</h2>
    <p class="section-lead">autoresearch 的核心抽象如何转化为 Skill 优化。同一台机器，不同的领域。</p>

    <table class="mapping-table">
      <thead>
        <tr>
          <th style="width:220px">Autoresearch</th>
          <th style="width:220px">Skill Optimizer</th>
          <th>实现细节</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>研究论文草稿</td>
          <td>SKILL.md 文件</td>
          <td>唯一的可编辑产物。所有改进都表现为对这一个文件的编辑。</td>
        </tr>
        <tr>
          <td>评估指标</td>
          <td>8维度评估体系</td>
          <td>跨结构（60分）和效果（40分）的加权评分，总计100分。</td>
        </tr>
        <tr>
          <td>实验循环</td>
          <td>阶段2 hill-climbing</td>
          <td>诊断最弱维度，提出修复，执行，重新评分，保留或回滚。每个 Skill 最多3轮。</td>
        </tr>
        <tr>
          <td>版本控制</td>
          <td>Git 分支 + revert</td>
          <td>每次编辑都是一次 commit。退化通过 revert（新 commit）回滚。完整审计记录。</td>
        </tr>
        <tr>
          <td>自动化评估</td>
          <td>子 Agent 测试执行</td>
          <td>独立 Agent 分别在启用和未启用 Skill 的情况下运行测试提示词，对比输出质量。</td>
        </tr>
        <tr>
          <td>人类审查关卡</td>
          <td>阶段转换暂停</td>
          <td>系统在基线评分后和每个 Skill 优化后暂停。展示 diff + 分数变化。</td>
        </tr>
        <tr>
          <td>探索 vs 利用</td>
          <td>阶段2.5探索性重写</td>
          <td>当 hill-climbing 停滞（连续2次在第1轮就中断），提出完整的结构重写。</td>
        </tr>
        <tr>
          <td>实验日志</td>
          <td>results.tsv</td>
          <td>带时间戳的记录：commit 哈希、Skill 名称、新旧分数、保留/回滚状态、评估模式。</td>
        </tr>
      </tbody>
    </table>
  </section>
</div>

<!-- ═══════════════════════════ FOOTER ═══════════════════════════ -->
<div class="container">
  <footer class="footer">
    <div class="footer-left">Auto Skill Optimizer</div>
    <div class="footer-right">灵感源自 Karpathy autoresearch &mdash; 为 Claude Code Skill 生态而建</div>
  </footer>
</div>

</body>
</html>

English | [中文](README.md)

</div>

!darwin.skill

darwin.skill 2.0

Optimize your Agent Skills the way you train models.

Inspired by Karpathy's autoresearch. Autonomous experiment loops, applied to skill optimization. A ratchet that only turns forward.

v2.0 · Updated 2026-05-28 · A structural upgrade integrating Microsoft Research's SkillLens and SkillOpt papers.

![License: MIT](LICENSE) ![Version](#whats-new-in-20) ![Agent Skill](https://skills.sh) ![Skills](https://skills.sh) ![Microsoft SkillOpt](https://github.com/microsoft/SkillOpt)

npx skills add alchaincyf/darwin-skill

</div>

---

[!NOTE]

🤝 Microsoft Research lists darwin-skill as an official SkillOpt integration.

On 2026-06-03, the SkillOpt repo noted:

"gbrain, gbrain-evals, and darwin-skill have all integrated SkillOpt."

We absorbed its validation-gated framework; it added darwin to its integration list. A two-way nod. 👉 Visit the SkillOpt repo

---

What's New in 2.0

v2.0 is not a patch release. It's a structural upgrade absorbing two Microsoft Research papers published on 2026-05-22. Five concrete changes:

1. Rubric expanded from 8 → 9 dimensions (integrating SkillLens's empirically validated 73.8% rubric recipe)

The legacy "error handling" dimension is upgraded to Failure Mechanism Encoding: not just "tell the agent to be careful," but explicitly encode known failure paths into the skill.
The legacy "clarity" dimension is upgraded to Actionable Specificity: explicitly bans vague hedge words like "suggest / could consider / depending on / use judgment / case by case."
A new ninth dimension High-Risk Action Blacklist: destructive operations like rm / git reset --hard / force push must be explicitly listed as forbidden in the skill.

2. Validation aligned with SkillOpt's validation-gated design

Multi-judge independent review: 2 independent judges per round
Judges never reused: each new round spawns fresh judges to avoid anchoring bias
Early stopping: if a round's score gain < 1 point, automatically halt to prevent padding for score
Dry-run control: warn when dry-run ratio exceeds 30%

3. Human-in-the-loop at three checkpoints (the core differentiator from SkillOpt's fully autonomous design)

Phase 1 baseline eval: auto + human review the report, decide what to optimize
Phase 2 single-dimension edit: 🔴 CHECKPOINT mandatory pause for user confirmation
Phase 2.5 test-prompt run (optional)
Phase 3 regression test: 🛑 STOP if gain falls below threshold

4. Anti-pattern blacklist with 8 explicit forbidden behaviors

1. Same AI both edits and scores (SkillLens empirical: LLM self-eval accuracy only 46.4%) 2. Using git reset --hard as a rollback mechanism (use git revert) 3. Padding edits just to push the score up 4. Skipping test prompts and scoring directly 5. Changing multiple dimensions in one round 6. Dry-run ratio > 30% 7. Silently swallowing exceptions 8. Ignoring correlated dimension clusters

5. Empirical validation data

huashu-gpt-image skill: 80.8 → 91.5 → 91.65 (+10.85, consensus across 6 independent judges)
darwin-skill self-eval: 86.05 → 92.05 → 92.7

---

The Core Loop

!Core Loop

Evaluate → Improve → Test → Human Confirm → Keep or Revert. Repeat.

---

Why This Exists

Agent skill ecosystems are expanding fast. Claude Code, Codex, OpenClaw, Trae, CodeBuddy and more all support the SKILL.md format. When you have 10 skills, you can maintain them by hand. When you have 60+, you need a system.

Traditional skill review is purely structural: does the frontmatter look right? Are the steps numbered? Do the file paths exist? But a perfectly formatted skill can still produce terrible output.

darwin.skill evaluates both structure and real-world effectiveness, then keeps only the changes that actually improve things.

---

From autoresearch to Skill Optimization

This project maps Karpathy's autoresearch directly onto skill optimization:

autoresearch	darwin.skill	Why
`program.md`	This SKILL.md	Defines evaluation criteria and constraints
`train.py`	Each target SKILL.md	The single editable asset per experiment
`val_bpb`	9-dimension weighted score (max 100)	Quantifiable optimization target
`git ratchet`	keep / revert mechanism	Only improving commits survive
`test set`	test-prompts.json	Validates whether improvements are real
Fully autonomous	Human in the loop	Skill quality is more subjective than loss

The key difference: autoresearch is fully autonomous (loss is just a number). Skill quality sometimes needs human judgment. So darwin.skill pauses after each skill's optimization cycle, shows you the diff and score delta, and waits for your confirmation.

---

Five Core Principles

#	Principle	Details
01	Single editable asset	One SKILL.md per experiment. One change, one measurement, one decision
02	Dual evaluation	Structure scoring (static analysis) + effectiveness scoring (live test execution)
03	Ratchet mechanism	Score can only go up. Regressions are auto-reverted
04	Independent scoring	The agent that edits is never the agent that scores (SkillLens: LLM self-eval is only 46.4% accurate)
05	Human in the loop	System pauses after each skill. You review, then continue

---

9-Dimension Evaluation Rubric

Total: 100 points. Structure + Effectiveness. v2.0's three new dimensions come directly from SkillLens's empirically validated rubric.

!Evaluation Rubric

The three new dimensions (SkillLens 73.8% rubric recipe):

Dimension	Description
Failure Mechanism Encoding	Explicitly encode known failure paths, not just "be careful" reminders
Actionable Specificity	Ban vague hedge words like "suggest / could consider / depending on / use judgment / case by case"
High-Risk Action Blacklist	Destructive operations (rm / git reset --hard / force push) must be explicitly forbidden

Live test performance has the highest weight. A beautifully written skill that produces bad output is still a bad skill.

---

The Optimization Cycle

Five phases. The system runs autonomously within each phase but pauses between phases for human confirmation.

!Optimization Lifecycle

Phase 2 (the heart, hardened in v2.0):

1. Find the lowest-scoring dimension 2. Generate one targeted improvement (one dimension per round, blacklist #5) 3. Edit SKILL.md, git commit 4. Spawn 2 independent sub-agents to re-score (next round spawns fresh judges to avoid anchoring) 5. Score up → keep. Score down → git revert (never git reset --hard, blacklist #2) 6. Round gain < 1 point → early-stop automatically (no padding for score) 7. 🔴 CHECKPOINT pauses, shows diff + score delta, waits for human confirmation

---

The Ratchet

Scores can only go up. Failed experiments are cleanly reverted. No regressions accumulate over time.

!Ratchet Mechanism

Round 2 scored 75, below the current best of 78. Auto-reverted. Effective baseline stays at 78. Subsequent improvements build from 78, not 75.

---

Quick Start

npx skills add alchaincyf/darwin-skill

After installation, tell your agent: "optimize all skills" or "optimize [skill-name]". Works with any tool that supports the SKILL.md format.

Can't access GitHub? Download the zip: darwin-skill.zip. Extract and place SKILL.md in ~/.claude/skills/darwin-skill/.

---

Design Inspiration

Directly inspired by Andrej Karpathy's [autoresearch](https://github.com/karpathy/autoresearch).

The core mechanism is identical: keep only measurable improvements, revert everything else.

v2.0 builds on this foundation by integrating two Microsoft Research papers (published 2026-05-22): SkillLens provides the empirically validated rubric design, and SkillOpt provides the formal framework of validation-gated edits.

---

References & Credits

v2.0's design directly builds on the following academic work. Recommended reading for researchers and engineers working on the skill ecosystem:

SkillLens

Microsoft Research. From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills. arXiv:2605.23899, 2026.

Paper: https://arxiv.org/abs/2605.23899
Contribution: The empirically validated 73.8% rubric recipe. darwin.skill v2.0's three new dimensions (Failure Mechanism Encoding / Actionable Specificity / High-Risk Action Blacklist) come directly from this paper. It is also the empirical source for the "same AI edits and scores" anti-pattern — LLM self-eval accuracy is only 46.4%.

SkillOpt

Microsoft Research. SkillOpt: Executive Strategy for Self-Evolving Agent Skills. arXiv:2605.23904, 2026.

🔗 Code repo: github.com/microsoft/SkillOpt (pip install skillopt, v0.1.0 on PyPI)
Project page: https://microsoft.github.io/SkillOpt/
Paper: https://arxiv.org/abs/2605.23904
Contribution: The formal framework of validation-gated edits. Treats a skill as the "external trainable state" of a frozen model: every edit must pass independent validation to be kept. darwin.skill v2.0's multi-judge independent review, non-reuse of judges, early stopping, and dry-run ratio control all align with this framework.
🤝 Mutual recognition: On 2026-06-03, the official SkillOpt repo listed darwin-skill as an integration: "gbrain, gbrain-evals, and darwin-skill have all integrated SkillOpt." They give us the framework; we give it real-world validation.

autoresearch

Andrej Karpathy. autoresearch. GitHub repository, 2026.

Code: https://github.com/karpathy/autoresearch
Contribution: The original inspiration for darwin.skill 1.0. The mapping of core mechanisms (program.md / train.py / val_bpb / git ratchet / test set) is inherited directly from autoresearch.

The key difference between darwin and SkillOpt: SkillOpt is fully autonomous; darwin.skill emphasizes human-in-the-loop — skill quality is more subjective than validation loss. Critical phases (baseline eval, single-dimension edit, regression test) mandatorily pause for the human to make the final judgment.

---

About the Author


🌐 Website	bookai.top · huasheng.ai
𝕏 Twitter	@AlchainHust
📺 Bilibili	花叔
▶️ YouTube	@Alchain
📕 Xiaohongshu	花叔
💬 WeChat	Search "花叔"

---

License

MIT

---

[Nuwa](https://github.com/alchaincyf/nuwa-skill) creates skills.<br> Darwin makes them evolve.<br><br> Keep only improvements. Time is on your side.

<br>

</div>

Runtime 适配性审查（详细对照表 + 扫描命令）

SKILL.md 在「Runtime 适配性审查」章节会引用本文件。Phase 1 基线评估时跑红灯扫描需要查这里。

---

背景

花叔的 skills 基于 Anthropic 开放的 Agent Skills 协议，应当能在 Claude Code、Codex、Cursor、OpenClaw、Hermes Agent、CodeBuddy、Workbuddy、Gemini CLI、OpenCode 等 50+ skills-compatible runtime 上通用。

这是 skill 分发力的根本——一个被误判为「单一 runtime 绑定」的 skill，会被其他 agent 直接拒绝安装（实例：nuwa-skill 因 README 写「在 Claude Code 里使用」被 Marvis agent 拒绝）。

适用范围：除非 skill 名字明确声明绑定单一 runtime（如 huashu-slides-codex、xxx-for-claude-code），所有 skill 必须通过本审查。

---

红灯信号（出现即扣分，必须在 P0 优化轮修复）

红灯类型	典型表现	危害
Badge 钉死	`[![Claude Code Skill]]`、`[![Cursor Only]]` 之类的单一 runtime badge	视觉上首屏定调，其他 runtime 用户直接退出
措辞钉死	「在 Claude Code 里」「Cursor 用户可以」「Codex 中使用」「Claude Code skill」	让 agent 解析时误判为"不是给我用的"
安装命令钉死	只给 `~/.claude/skills/` 路径、只给 `/plugin install`、只给某 runtime 私有 CLI	不知道这是 Claude Code 命令的 agent 会拒绝
工具调用钉死	工作流里硬编码 `mcp__claude-in-chrome__*`、`PostToolUse hook` 等单 runtime 能力，且不给替代方案	其他 runtime 没这些工具 → 流程跑不通
路径硬编码	`~/.claude/skills/xxx/`、`.claude/agents/yyy` 作为唯一路径	其他 runtime 用 `~/.cursor/skills/` `~/.codex/skills/`

---

绿灯措辞（推荐改写）

红灯	绿灯
"在 Claude Code 里"	"在你的 agent 里" / "在任何 skills-compatible runtime 中"
"Claude Code skill"	"Agent Skill"
"Claude Code 用户"	"skills-aware agent 用户"
单一 badge 钉死	`Agent Skills Standard` + `skills.sh Compatible` + `Multi-Runtime` 三个中立 badge
只给 `npx skills add ...` 一行	三层结构：① 自动检测的一行命令 ② 折叠展开的各 runtime 手动路径 ③ 「作为参考资料 cat 进 context」fallback
工具名硬编码	"用一个 browser automation 工具（例如 Claude 的 chrome MCP、Playwright 等）"

---

例外清单（允许的「Claude Code 痕迹」）

不是所有 Claude-Code 相关字符都要清除。下面这些是正当出现的，不算红灯：

1. Frontmatter `description` 里的中英文触发词——这是 skill 入口，其他 runtime 解析 frontmatter 时同样能匹配 2. 花叔生态内部联动的 skill 名引用——如「调用 huashu-design」「跟 darwin-skill 配套」 3. 明确标注的 runtime-specific 章节——如「### 仅 Claude Code 优化（按需触发）」+ 解释清楚是 nice-to-have 4. commit message、changelog、内部脚本——不属于用户读到的 skill 内容

---

审查时机

Phase 1 基线评估时：每个 skill 跑一次红灯扫描，命中项以 runtime_warn=N 形式写入 results.tsv 的 note 列（不新增列、保持向后兼容）
Phase 2 优化循环时：红灯命中数 ≥ 1 的 skill，强制把第一轮优化方向定为 P0「runtime drift 修复」（详见 SKILL.md 优化策略库的 P0 章节），优先于其他维度
Phase 3 汇总报告时：单独一栏「runtime 中立度」展示修复进度（命中数从 X → 0）

---

红灯扫描快速命令

# 在 skill 目录跑这个 grep，输出即红灯命中
grep -nE "(在 Claude Code|Claude Code skill|Claude Code 用户|Cursor only|Codex 中|^\[!\[Claude Code|~/\.claude/skills/[a-z]|/plugin install\b)" SKILL.md README.md 2>/dev/null

输出非空 = 该 skill 未通过 gate，必须在优化循环里修复。

#!/usr/bin/env node
/**
 * Darwin Skill - 高清截图脚本
 *
 * 用法: node scripts/screenshot.mjs [html文件路径] [输出png路径]
 *
 * 特性:
 * - 2x deviceScaleFactor，输出高清图
 * - 只截 .card 元素，无多余背景
 * - 等待字体加载完成
 * - 截完自动用 open 命令打开图片
 */

import { createRequire } from 'module';
const require = createRequire(import.meta.url);

// 使用全局安装的 playwright-core
const pw = require('/Users/alchain/.npm-global/lib/node_modules/playwright/node_modules/playwright-core');

const htmlPath = process.argv[2] || new URL('../templates/result-card.html', import.meta.url).pathname;
const outputPath = process.argv[3] || new URL('../templates/result-card.png', import.meta.url).pathname;

async function screenshot() {
  const browser = await pw.chromium.launch();

  try {
    const context = await browser.newContext({
      viewport: { width: 920, height: 1600 },
      deviceScaleFactor: 2,
    });

    const page = await context.newPage();

    await page.goto(`file://${htmlPath}`, { waitUntil: 'networkidle' });

    // 等待字体加载
    await page.evaluate(() => document.fonts.ready);
    // 额外等待确保渲染完成
    await page.waitForTimeout(2000);

    // 只截 .card 元素
    const card = await page.locator('.card');
    await card.screenshot({
      path: outputPath,
      type: 'png',
    });

    console.log(`截图完成: ${outputPath}`);

    // 获取图片尺寸信息
    const box = await card.boundingBox();
    console.log(`卡片尺寸: ${Math.round(box.width)}x${Math.round(box.height)}px (CSS)`);
    console.log(`输出尺寸: ${Math.round(box.width * 2)}x${Math.round(box.height * 2)}px (2x高清)`);

  } finally {
    await browser.close();
  }

  // 自动打开图片
  const { execSync } = require('child_process');
  execSync(`open "${outputPath}"`);
}

screenshot().catch(err => {
  console.error('截图失败:', err.message);
  process.exit(1);
});

Related skills

TddFollow test-driven development with a strict red-green-refactor loop when creating reliable features or fixing bugs.510k185k

Test Driven DevelopmentEnforce writing failing tests before any production implementation code.176k260k

QaRun conversational QA sessions that turn user-reported bugs into well-written, domain-aware GitHub issues without manual ticket writing.164k185k

Migrate To ShoehornAutomatically update TypeScript test files that rely on unsafe `as` type assertions by replacing them with type-safe partial objects from @total-typescript/shoehorn.151k185k

Webapp TestingVerify frontend behavior, debug UI issues, capture screenshots, and inspect logs of a running local web application using Playwright.121k164k

Playwright CliRun browser automation, generate element snapshots, inspect DOM attributes, and execute Playwright tests from the terminal.96.3k12.2k

About

Darwin Skill by the numbers

darwin-skill capabilities & compatibility

What darwin-skill says it does

Add your badge

What problem does darwin-skill solve for developers using this skill?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Darwin Skill 2.0

设计哲学

评估 Rubric（9维度，总分100）

结构维度（59分）— 静态分析

效果维度（35分）— 需要实测

Meta-skill 维度（6分）— 反例与黑名单

评分规则

Rubric 的实证基础

关于「实测表现」维度

Runtime 适配性审查（gate 项，独立于 9 维度评分）

Phase 1 基线评估时强制跑一次红灯扫描

例外（允许的「Claude Code 痕迹」）

自主优化循环

Phase 0: 初始化

Phase 0.5: 测试Prompt设计

Phase 1: 基线评估（Baseline）

Phase 2: 优化循环

Phase 2.5: 探索性重写（按需触发）

Phase 3: 汇总报告

results.tsv 格式

实战 high-leverage 操作（精髓速查）

优化策略库

P0: Runtime 适配性问题（gate 项命中 → 必须先修）

P0: 效果问题（实测发现的）

P1: 结构性问题

P2: 具体性问题

P3: 可读性问题

异常与边界条件

darwin 操作反例黑名单（dim9 应用：darwin 自己优化时不要做的事）

约束规则

使用方式

全量优化（推荐首次使用）

单个优化

仅评估不改

查看历史

设计灵感

学术依据 & Credits

成果卡片生成（Result Card）

卡片模板

生成流程

darwin.skill 2.0

What's New in 2.0

The Core Loop

Why This Exists

From autoresearch to Skill Optimization

Five Core Principles

9-Dimension Evaluation Rubric

The Optimization Cycle

The Ratchet

Quick Start

Design Inspiration

References & Credits

SkillLens

SkillOpt

autoresearch

About the Author

License

Runtime 适配性审查（详细对照表 + 扫描命令）

背景

红灯信号（出现即扣分，必须在 P0 优化轮修复）

绿灯措辞（推荐改写）

例外清单（允许的「Claude Code 痕迹」）

审查时机

红灯扫描快速命令

Related skills

FAQ

What does darwin-skill do?

When should I use darwin-skill?

Is darwin-skill safe to install?