
Summon
Keep long-running egregore agent sessions inside token windows and recover cleanly from Claude API rate limits.
Overview
Summon is an agent skill for the Operate phase that manages egregore token budget windows, rate-limit detection, and cooldown-driven graceful shutdown.
Install
npx skills add https://github.com/athola/claude-night-market --skill summonWhat is this skill?
- Tracks cumulative token usage and session count inside a configurable budget window (default 5 hours).
- Detects rate limits via HTTP 429, API error text, or explicit retry-after headers and stops work immediately.
- Computes cooldown as retry-after minutes plus configurable padding (default 10 minutes) to avoid relaunch loops.
- Falls back to a 30-minute default cooldown plus padding when retry-after is missing.
- Persists window start, estimated tokens, last rate limit, and cooldown_until in structured JSON state.
- Default budget window type 5h
- Default cooldown padding 10 minutes
- Default 30-minute cooldown when retry-after is absent
Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 0/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You run chained agent sessions but have no shared token window or safe cooldown when Claude rate-limits you mid-orchestration.
Who is it for?
Solo builders operating custom egregore or Night Market orchestrators across multiple agent sessions in one day.
Skip if: Builders who only use short single-turn Claude Code tasks and never hit cumulative token or rate-limit windows.
When should I use this skill?
Managing token budget windows, detecting API rate limits, or configuring cooldown before resuming egregore sessions.
What do I get? / Deliverables
After applying the protocol, the orchestrator persists budget state, enters a padded cooldown on limits, and avoids relaunching until the window is safe again.
- Updated budget.json with window, tokens, and cooldown fields
- Orchestrator stop and cooldown schedule on rate limit
Recommended Skills
Journey fit
Canonical shelf is Operate because budget windows, cooldowns, and graceful shutdown govern production-style agent orchestration after you are already running sessions. Infra fits persistent budget state in `.egregore/budget.json`, window timing, and orchestrator shutdown logic rather than feature coding.
How it compares
Use instead of ad-hoc sleep-and-retry loops that ignore shared session budget files.
Common Questions / FAQ
Who is summon for?
Summon is for indie developers and agent operators who run Athola egregore-style orchestration and need disciplined token and rate-limit handling across sessions.
When should I use summon?
Use summon during Operate when you maintain `.egregore/budget.json`, detect 429 or retry-after from the API, or need default 5-hour windows and padded cooldowns before resuming skill chains.
Is summon safe to install?
Review the Security Audits panel on this Prism page and inspect the skill source in your repo before letting an orchestrator write budget state or stop production runs.
SKILL.md
READMESKILL.md - Summon
# Budget Module Manages token budget windows, detects rate limits, calculates cooldown periods, and triggers graceful shutdown when limits are reached. ## Budget Window An egregore session operates within a budget window (default: 5 hours). The window tracks cumulative token usage and rate limit events across multiple sessions. The budget state is in `.egregore/budget.json`: ```json { "window_type": "5h", "window_started_at": "2026-03-04T10:00:00+00:00", "estimated_tokens_used": 0, "session_count": 1, "last_rate_limit_at": null, "cooldown_until": null } ``` ## Rate Limit Detection The orchestrator detects rate limits through two signals: 1. **API error response**: a skill invocation fails with an HTTP 429 or a rate-limit error message from the Claude API. 2. **Explicit retry-after header**: the error includes a retry-after duration in seconds. When either signal is detected, the orchestrator must stop work immediately and enter cooldown. ## Cooldown Calculation The cooldown duration is computed as follows: ``` cooldown_minutes = retry_after_seconds / 60 + config.budget.cooldown_padding_minutes ``` The padding (default: 10 minutes) prevents the watchdog from relaunching too early and hitting the same rate limit again. If no retry-after header is present, use a default cooldown of 30 minutes plus padding. ## Rate Limit Recovery When a rate limit is detected: 1. **Save manifest**: write the current pipeline state to `.egregore/manifest.json` so no progress is lost. 2. **Record rate limit**: call `budget.record_rate_limit(cooldown_minutes)` to update the budget state. 3. **Save budget**: write `.egregore/budget.json` with the updated cooldown timestamp. 4. **Alert overseer**: send a notification via the configured channel (see `notify.py`) with the rate limit details and expected resume time. ### In-Session Recovery (2.1.71+, all providers 2.1.73+) Use `CronCreate` to schedule a one-shot resume at the cooldown expiry time. As of 2.1.73, `/loop` and `CronCreate` are available on Bedrock, Vertex, Foundry, and with telemetry disabled (previously first-party API only). ``` CronCreate( cron: "<min> <hour> * * *", prompt: "Cooldown expired. Read .egregore/manifest.json and resume the pipeline. Invoke Skill(egregore:summon) to continue.", recurring: false ) ``` **Advantages over watchdog restart:** - Session stays alive: no context loss, no manifest re-read overhead, no fresh session startup cost - Exact timing: fires at cooldown_until instead of polling every 5 minutes - No OS-level setup: works without launchd/systemd The session remains idle between the rate limit and the scheduled prompt. When the cron fires, the orchestration loop resumes with the full conversation context intact. ### Fallback: Exit and Watchdog If `CronCreate` is unavailable (pre-2.1.71, or pre-2.1.73 on Bedrock/Vertex/Foundry), the cooldown exceeds 7 days (cron task auto-expiry), or the session itself needs to exit for other reasons: 5. **Exit cleanly**: exit with code 0. A non-zero exit would trigger the watchdog's crash handler instead of the cooldown-aware restart path. The watchdog checks `budget.json` before relaunching and waits until the cooldown expires. ## Pre-Launch Cooldown Check Before starting any work, the orchestrator must check: ```python if is_in_cooldown(budget): # Do not start. Exit and let the watchdog retry later. sys.exit(0) ``` The watchdog also performs this check before launching a new session. This double-check prevents races where the watchdog reads a stale budget file. ## Window Reset The budget window resets when `window_started_at` is older than the configured `window_type` duration. On reset: 1. Set `estimated_tokens_used` to 0. 2. Set `session_count` to 0. 3. Set `window_started_at` to the current time. 4. Clear `last_rate