
Token Budget Advisor
Let the user pick response depth and token spend before the agent answers long or expensive threads.
Overview
Token Budget Advisor is a journey-wide agent skill that offers depth and token-budget choices before answering whenever the user explicitly wants to control response length or detail.
Install
npx skills add https://github.com/affaan-m/everything-claude-code --skill token-budget-advisorWhat is this skill?
- Triggers on explicit token-budget, depth, or short-vs-long answer requests—not auth/session “tokens”
- Step 1 estimates input tokens before offering depth choices
- Maintains a chosen depth silently for the rest of the session once the user sets it
- Skips activation for trivial one-line answers or when depth was already specified
- Bilingual trigger phrases (EN/ES) for budget and brevity control
- Multi-step flow: estimate input tokens, then offer depth choice before the main answer
Adoption & trust: 3.4k installs on skills.sh; 210k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want a long agent answer sometimes but cannot afford open-ended max-detail replies on every prompt in a paid coding session.
Who is it for?
Power users on metered Claude/Cursor/Codex plans who routinely ask for TL;DR, partial depth, or token-aware answers.
Skip if: Sessions where you already named a depth level for this chat, one-line factual lookups, or discussions about auth/payment/session tokens.
When should I use this skill?
User explicitly wants token budget, response length, answer depth, short/brief/detailed/exhaustive variants, or Spanish phrasing for controlling how much the agent uses—not auth/session payment tokens.
What do I get? / Deliverables
The agent pauses to estimate input size, lets you pick depth upfront, then answers at that level for the rest of the session until you change it.
- Depth choice prompt before the substantive answer
- Session-consistent depth until the user changes it
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Ask for a deep architecture review but cap follow-up explanations at brief depth to save tokens.
Request an exhaustive security pass once, then maintain a shorter depth for nit fixes.
Draft marketing copy with an explicit ‘short version’ before expanding only the sections you approve.
Triage production logs with TL;DR summaries first, then escalate to detailed traces on one incident.
How it compares
Session meta-control for answer depth—not a linter, test runner, or codebase integration skill.
Common Questions / FAQ
Who is token-budget-advisor for?
Solo builders and agent power users who explicitly want to trade detail for token savings during long coding and planning sessions.
When should I use token-budget-advisor?
Whenever you mention token budget, response length, short vs detailed answers, or depth control—during Build implementation, Ship reviews, Grow content drafts, or Operate incident triage alike.
Is token-budget-advisor safe to install?
It is behavioral guidance only; confirm package integrity and audit results on this page’s Security Audits panel before enabling skills in production agents.
SKILL.md
READMESKILL.md - Token Budget Advisor
# Token Budget Advisor (TBA) Intercept the response flow to offer the user a choice about response depth **before** Claude answers. ## When to Use - User wants to control how long or detailed a response is - User mentions tokens, budget, depth, or response length - User says "short version", "tldr", "brief", "al 25%", "exhaustive", etc. - Any time the user wants to choose depth/detail level upfront **Do not trigger** when: user already set a level this session (maintain it silently), or the answer is trivially one line. ## How It Works ### Step 1 — Estimate input tokens Use the repository's canonical context-budget heuristics to estimate the prompt's token count mentally. Use the same calibration guidance as [context-budget](../context-budget/SKILL.md): - prose: `words × 1.3` - code-heavy or mixed/code blocks: `chars / 4` For mixed content, use the dominant content type and keep the estimate heuristic. ### Step 2 — Estimate response size by complexity Classify the prompt, then apply the multiplier range to get the full response window: | Complexity | Multiplier range | Example prompts | |--------------|------------------|------------------------------------------------------| | Simple | 3× – 8× | "What is X?", yes/no, single fact | | Medium | 8× – 20× | "How does X work?" | | Medium-High | 10× – 25× | Code request with context | | Complex | 15× – 40× | Multi-part analysis, comparisons, architecture | | Creative | 10× – 30× | Stories, essays, narrative writing | Response window = `input_tokens × mult_min` to `input_tokens × mult_max` (but don’t exceed your model’s configured output-token limit). ### Step 3 — Present depth options Present this block **before** answering, using the actual estimated numbers: ``` Analyzing your prompt... Input: ~[N] tokens | Type: [type] | Complexity: [level] | Language: [lang] Choose your depth level: [1] Essential (25%) -> ~[tokens] Direct answer only, no preamble [2] Moderate (50%) -> ~[tokens] Answer + context + 1 example [3] Detailed (75%) -> ~[tokens] Full answer with alternatives [4] Exhaustive (100%) -> ~[tokens] Everything, no limits Which level? (1-4 or say "25% depth", "50% depth", "75% depth", "100% depth") Precision: heuristic estimate ~85-90% accuracy (±15%). ``` Level token estimates (within the response window): - 25% → `min + (max - min) × 0.25` - 50% → `min + (max - min) × 0.50` - 75% → `min + (max - min) × 0.75` - 100% → `max` ### Step 4 — Respond at the chosen level | Level | Target length | Include | Omit | |------------------|---------------------|-----------------------------------------------------|---------------------------------------------------| | 25% Essential | 2-4 sentences max | Direct answer, key conclusion