
Ce Sessions
Search and synthesize prior compound-engineering sessions so findings keep terminology resolution context for downstream ce-compound vocabulary capture.
Overview
ce-sessions is an agent skill most often used in Build (also Validate and Ship) that retrieves compound-engineering sessions with terminology and resolution context preserved for downstream compounding skills.
Install
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill ce-sessionsWhat is this skill?
- Designed so findings preserve load-bearing domain terms for downstream Phase 2.4 vocabulary scans
- Two-stage grading: programmatic must-tier term recall plus LLM context-preservation verdicts
- Variance protocol: 3 runs per eval with pass threshold on must-tier recall mean and stddev
- Eval scenarios include synthesis-gate-recovery and ce-plan terminology (e.g. Phase 0.7)
- Narrow scope: terminology resolution in session output—not arbitrary topic search quality alone
- Variance protocol uses 3 runs per eval
- Pass threshold: must-tier recall ≥ 80% mean AND stddev < 20%
- Two-stage grading pipeline (programmatic tiers + LLM context grader)
Adoption & trust: 1.6k installs on skills.sh; 20.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Prior session search returns shallow summaries and downstream compound skills lose the named terms and rationale they need to extend vocabulary safely.
Who is it for?
Builders on the everyinc compound-engineering plugin who chain ce-plan, sessions lookup, and ce-compound and need consistent term fidelity.
Skip if: Arbitrary enterprise search across unrelated repos, or teams not using compound-engineering session artifacts and grading assumptions from PR #838.
When should I use this skill?
You need compound-engineering session answers where domain terms and resolution rationale must survive for downstream ce-compound vocabulary work.
What do I get? / Deliverables
Findings include must-tier domain terms and expected_context with resolution rationale so ce-compound and related skills can scan and compound without synthesis loss.
- Session findings text with must/should-tier terminology preserved
- Synthesis suitable for downstream vocabulary capture when grading criteria pass
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Primary shelf is Build agent-tooling because ce-sessions is part of the compound-engineering plugin stack for agents retrieving structured session knowledge before planning or compounding. Agent-tooling captures session search, synthesis gates, and handoff to ce-plan/ce-compound—not generic note-taking.
Where it fits
Ask how the synthesis gate in ce-plan was designed before implementing the next plugin phase.
Pull session history on Phase 0.7 decisions to avoid rescoping work already settled in compound sessions.
Verify release notes align with session findings where must-tier terms must appear verbatim with context.
How it compares
Plugin session retrieval with terminology contracts—not a general RAG chat over random markdown.
Common Questions / FAQ
Who is ce-sessions for?
Solo builders and small teams using the compound-engineering plugin who need prior session answers that stay precise enough for automated vocabulary capture.
When should I use ce-sessions?
In Build before extending agent tooling; in Validate when scoping from past ce-plan work; in Ship when reviewing whether documented session findings still match implementation intent.
Is ce-sessions safe to install?
Treat it like any third-party agent plugin skill; review the Security Audits panel on this page and limit session sources to repositories you trust.
SKILL.md
READMESKILL.md - Ce Sessions
{ "skill_name": "ce-sessions", "purpose": "Validate that ce-sessions findings preserve enough terminology resolution context for downstream vocabulary capture (load-bearing assumption in PR #838 — ce-compound Phase 2.4 scans ce-sessions findings for qualifying domain terms).", "non_purpose": "Not testing ce-sessions's general search quality or its ability to find sessions on arbitrary topics. The narrow assumption is about terminology-resolution preservation.", "variance_protocol": { "runs_per_eval": 3, "stability_metric": "stddev of must-tier term recall across runs", "pass_threshold": "must-tier recall >= 80% mean AND stddev < 20%" }, "grading_pipeline": { "stage_1": "Programmatic substring match per criticality tier (must / should / may). Pass = all 'must' terms appear in findings.", "stage_2": "LLM grader (see grader.md) — judges whether each 'expected_context' item is preserved WITH resolution rationale, not only as a keyword hit. Pass = all expected_context items receive 'preserved with context' verdict." }, "evals": [ { "id": 1, "name": "synthesis-gate-recovery", "tests_risk": "synthesis_loss", "prompt": "What was the synthesis gate work in ce-plan about? I want to understand how it was designed and what problems it solved.", "expected_terms": [ {"term": "synthesis gate", "tier": "must"}, {"term": "ce-plan", "tier": "must"}, {"term": "Phase 0.7", "tier": "should"}, {"term": "Phase 5.1.5", "tier": "should"}, {"term": "Stated", "tier": "should"}, {"term": "Inferred", "tier": "should"}, {"term": "Out of scope", "tier": "should"}, {"term": "call-outs", "tier": "may"}, {"term": "synthesis-summary.md", "tier": "may"}, {"term": "silent proceeding is not allowed", "tier": "may"} ], "expected_context": [ "synthesis gate appears with its purpose (prevent silent proceed past synthesis without user check), not only as a keyword", "Stated / Inferred / Out of scope appear as bucket categorization, not only as a phrase" ], "ground_truth": { "primary_pr": 822, "primary_merge_commit": "39cb9da3a1a90a7ce7418f7a64d7ff3c8f9a917c", "related_prs": [819, 829], "merged_at": "2026-05-15" }, "notes": "Distinctive coined term that should be near-impossible to ignore if ce-sessions touched the originating session. Failure here indicates strong synthesis loss." }, { "id": 2, "name": "mode-headless-semantic-alignment", "tests_risk": "synthesis_loss", "prompt": "How was mode:headless aligned across the compound family of skills? Why was it added and what changed?", "expected_terms": [ {"term": "mode:headless", "tier": "must"}, {"term": "ce-compound", "tier": "must"}, {"term": "mode:autofix", "tier": "should"}, {"term": "ce-compound-refresh", "tier": "should"}, {"term": "sticky mode token", "tier": "should"}, {"term": "Discoverability Check", "tier": "should"}, {"term": "process exhaust", "tier": "may"}, {"term": "audit content", "tier": "may"}, {"term": "Compare per skill, not per mode", "tier": "may"}, {"term": "Assumptions section", "tier": "may"} ], "expected_context": [ "mode:autofix → mode:headless rename appears with reasoning (the compound family should speak the same word)", "process exhaust vs audit content principle appears with the refined rule (compare per skill, not per mode — interactive ce-compound doesn't validate the same inferences headless skips)" ], "ground_truth": { "primary_pr": 813, "primary_merge_commit": "9b45a83d7ed2534669656fb3abf6a2c23e2e4f59", "merged_at": "2026-05-10" }, "notes": "More nuanced than #1 — tests preservation of a multi-piece design decision (rename + cross-skill alignment + a principle r