Ce Sessions

Primary shelf is Build agent-tooling because ce-sessions is part of the compound-engineering plugin stack for agents retrieving structured session knowledge before planning or compounding. Agent-tooling captures session search, synthesis gates, and handoff to ce-plan/ce-compound—not generic note-taking.

Also useful

Also useful

Where it fits

Example use

Ask how the synthesis gate in ce-plan was designed before implementing the next plugin phase.

Example use

Pull session history on Phase 0.7 decisions to avoid rescoping work already settled in compound sessions.

Example use

Verify release notes align with session findings where must-tier terms must appear verbatim with context.

How it compares

Plugin session retrieval with terminology contracts—not a general RAG chat over random markdown.

Common Questions / FAQ

Who is ce-sessions for?

Solo builders and small teams using the compound-engineering plugin who need prior session answers that stay precise enough for automated vocabulary capture.

When should I use ce-sessions?

In Build before extending agent tooling; in Validate when scoping from past ce-plan work; in Ship when reviewing whether documented session findings still match implementation intent.

Is ce-sessions safe to install?

Treat it like any third-party agent plugin skill; review the Security Audits panel on this page and limit session sources to repositories you trust.

SKILL.md

READMESKILL.md - Ce Sessions

{
  "skill_name": "ce-sessions",
  "purpose": "Validate that ce-sessions findings preserve enough terminology resolution context for downstream vocabulary capture (load-bearing assumption in PR #838 — ce-compound Phase 2.4 scans ce-sessions findings for qualifying domain terms).",
  "non_purpose": "Not testing ce-sessions's general search quality or its ability to find sessions on arbitrary topics. The narrow assumption is about terminology-resolution preservation.",
  "variance_protocol": {
    "runs_per_eval": 3,
    "stability_metric": "stddev of must-tier term recall across runs",
    "pass_threshold": "must-tier recall >= 80% mean AND stddev < 20%"
  },
  "grading_pipeline": {
    "stage_1": "Programmatic substring match per criticality tier (must / should / may). Pass = all 'must' terms appear in findings.",
    "stage_2": "LLM grader (see grader.md) — judges whether each 'expected_context' item is preserved WITH resolution rationale, not only as a keyword hit. Pass = all expected_context items receive 'preserved with context' verdict."
  },
  "evals": [
    {
      "id": 1,
      "name": "synthesis-gate-recovery",
      "tests_risk": "synthesis_loss",
      "prompt": "What was the synthesis gate work in ce-plan about? I want to understand how it was designed and what problems it solved.",
      "expected_terms": [
        {"term": "synthesis gate", "tier": "must"},
        {"term": "ce-plan", "tier": "must"},
        {"term": "Phase 0.7", "tier": "should"},
        {"term": "Phase 5.1.5", "tier": "should"},
        {"term": "Stated", "tier": "should"},
        {"term": "Inferred", "tier": "should"},
        {"term": "Out of scope", "tier": "should"},
        {"term": "call-outs", "tier": "may"},
        {"term": "synthesis-summary.md", "tier": "may"},
        {"term": "silent proceeding is not allowed", "tier": "may"}
      ],
      "expected_context": [
        "synthesis gate appears with its purpose (prevent silent proceed past synthesis without user check), not only as a keyword",
        "Stated / Inferred / Out of scope appear as bucket categorization, not only as a phrase"
      ],
      "ground_truth": {
        "primary_pr": 822,
        "primary_merge_commit": "39cb9da3a1a90a7ce7418f7a64d7ff3c8f9a917c",
        "related_prs": [819, 829],
        "merged_at": "2026-05-15"
      },
      "notes": "Distinctive coined term that should be near-impossible to ignore if ce-sessions touched the originating session. Failure here indicates strong synthesis loss."
    },
    {
      "id": 2,
      "name": "mode-headless-semantic-alignment",
      "tests_risk": "synthesis_loss",
      "prompt": "How was mode:headless aligned across the compound family of skills? Why was it added and what changed?",
      "expected_terms": [
        {"term": "mode:headless", "tier": "must"},
        {"term": "ce-compound", "tier": "must"},
        {"term": "mode:autofix", "tier": "should"},
        {"term": "ce-compound-refresh", "tier": "should"},
        {"term": "sticky mode token", "tier": "should"},
        {"term": "Discoverability Check", "tier": "should"},
        {"term": "process exhaust", "tier": "may"},
        {"term": "audit content", "tier": "may"},
        {"term": "Compare per skill, not per mode", "tier": "may"},
        {"term": "Assumptions section", "tier": "may"}
      ],
      "expected_context": [
        "mode:autofix → mode:headless rename appears with reasoning (the compound family should speak the same word)",
        "process exhaust vs audit content principle appears with the refined rule (compare per skill, not per mode — interactive ce-compound doesn't validate the same inferences headless skips)"
      ],
      "ground_truth": {
        "primary_pr": 813,
        "primary_merge_commit": "9b45a83d7ed2534669656fb3abf6a2c23e2e4f59",
        "merged_at": "2026-05-10"
      },
      "notes": "More nuanced than #1 — tests preservation of a multi-piece design decision (rename + cross-skill alignment + a principle r

What is this skill?

Designed so findings preserve load-bearing domain terms for downstream Phase 2.4 vocabulary scans

Two-stage grading: programmatic must-tier term recall plus LLM context-preservation verdicts

Variance protocol: 3 runs per eval with pass threshold on must-tier recall mean and stddev

Eval scenarios include synthesis-gate-recovery and ce-plan terminology (e.g. Phase 0.7)

Narrow scope: terminology resolution in session output—not arbitrary topic search quality alone

Variance protocol uses 3 runs per eval

Pass threshold: must-tier recall ≥ 80% mean AND stddev < 20%

Two-stage grading pipeline (programmatic tiers + LLM context grader)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.6k installs on skills.sh; 20.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

Findings include must-tier domain terms and expected_context with resolution rationale so ce-compound and related skills can scan and compound without synthesis loss.

Session findings text with must/should-tier terminology preserved

Synthesis suitable for downstream vocabulary capture when grading criteria pass

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Ask how the synthesis gate in ce-plan was designed before implementing the next plugin phase.

Example use

Pull session history on Phase 0.7 decisions to avoid rescoping work already settled in compound sessions.

Example use