
Context Window Management
Design tiered summarization, trimming, and routing so long agent sessions stay coherent without blowing token budgets.
Overview
Context-window-management is an agent skill most often used in Build (also Ship perf, Operate iterate) that defines tiered summarization, trimming, and routing patterns to control LLM context rot.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill context-window-managementWhat is this skill?
- Tiered context strategy by token ceilings with full, summarize, and rag modes
- Capabilities: summarization, trimming, routing, prioritization, and token counting
- Explicit scope boundary: optimization strategies, not full RAG or fine-tuning implementations
- Ecosystem notes for tiktoken, LangChain utilities, and large-context Claude APIs with caching
- Prerequisites call out LLM fundamentals and recommended prompt-engineering skill
Adoption & trust: 1k installs on skills.sh; 40.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your multi-turn agent silently forgets instructions or stalls because the context window fills with tool logs and chat history you never prioritize or compress.
Who is it for?
Indie builders shipping chat or tool-using agents who need predictable token budgets across long sessions.
Skip if: Teams that only need single-shot prompts, or projects where a managed RAG platform already owns retrieval and chunking end to end.
When should I use this skill?
Building or debugging any multi-turn conversation system where context size, summarization, trimming, or routing decisions affect quality and cost.
What do I get? / Deliverables
You adopt tiered strategies—token counting, summarization, trimming, and model routing—so sessions stay on-policy without re-implementing full RAG stacks.
- Tiered context policy table (thresholds, strategy, model)
- Summarization and trimming rules for transcripts
- Routing guidance aligned to token ceilings
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build agent-tooling is the first journey moment builders choose context policies before wiring chat, tools, and memory into a product. Agent-tooling matches procedural patterns for multi-turn systems, token counting, and model routing—not one-off frontend polish.
Where it fits
Define ContextTier thresholds before your copilot chooses Haiku for short threads and Sonnet only when history stays under 32k summarized tokens.
Stress-test a 50-turn support flow with trimming rules so latency and cost stay inside production SLOs.
Tune summarization prompts after users report the agent lost refund policy details mid-ticket.
How it compares
Use instead of dumping entire repos into every turn; it is a strategy skill, not an MCP server or hosted memory product.
Common Questions / FAQ
Who is context-window-management for?
Solo developers and small teams building LLM-powered agents who must reason about tokens, summarization, and model choice without a platform team.
When should I use context-window-management?
While designing Build agent-tooling chat loops, before Ship perf testing on worst-case transcripts, and during Operate iterate when users report forgotten rules or duplicated answers.
Is context-window-management safe to install?
It is documentation-heavy strategy content—still review Prism Security Audits and ensure trimming policies do not strip secrets or compliance text your product must retain.
SKILL.md
READMESKILL.md - Context Window Management
# Context Window Management Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rot ## Capabilities - context-engineering - context-summarization - context-trimming - context-routing - token-counting - context-prioritization ## Prerequisites - Knowledge: LLM fundamentals, Tokenization basics, Prompt engineering - Skills_recommended: prompt-engineering ## Scope - Does_not_cover: RAG implementation details, Model fine-tuning, Embedding models - Boundaries: Focus is context optimization, Covers strategies not specific implementations ## Ecosystem ### Primary_tools - tiktoken - OpenAI's tokenizer for counting tokens - LangChain - Framework with context management utilities - Claude API - 200K+ context with caching support ## Patterns ### Tiered Context Strategy Different strategies based on context size **When to use**: Building any multi-turn conversation system interface ContextTier { maxTokens: number; strategy: 'full' | 'summarize' | 'rag'; model: string; } const TIERS: ContextTier[] = [ { maxTokens: 8000, strategy: 'full', model: 'claude-3-haiku' }, { maxTokens: 32000, strategy: 'full', model: 'claude-3-5-sonnet' }, { maxTokens: 100000, strategy: 'summarize', model: 'claude-3-5-sonnet' }, { maxTokens: Infinity, strategy: 'rag', model: 'claude-3-5-sonnet' } ]; async function selectStrategy(messages: Message[]): ContextTier { const tokens = await countTokens(messages); for (const tier of TIERS) { if (tokens <= tier.maxTokens) { return tier; } } return TIERS[TIERS.length - 1]; } async function prepareContext(messages: Message[]): PreparedContext { const tier = await selectStrategy(messages); switch (tier.strategy) { case 'full': return { messages, model: tier.model }; case 'summarize': const summary = await summarizeOldMessages(messages); return { messages: [summary, ...recentMessages(messages)], model: tier.model }; case 'rag': const relevant = await retrieveRelevant(messages); return { messages: [...relevant, ...recentMessages(messages)], model: tier.model }; } } ### Serial Position Optimization Place important content at start and end **When to use**: Constructing prompts with significant context // LLMs weight beginning and end more heavily // Structure prompts to leverage this function buildOptimalPrompt(components: { systemPrompt: string; criticalContext: string; conversationHistory: Message[]; currentQuery: string; }): string { // START: System instructions (always first) const parts = [components.systemPrompt]; // CRITICAL CONTEXT: Right after system (high primacy) if (components.criticalContext) { parts.push(`## Key Context\n${components.criticalContext}`); } // MIDDLE: Conversation history (lower weight) // Summarize if long, keep recent messages full const history = components.conversationHistory; if (history.length > 10) { const oldSummary = summarize(history.slice(0, -5)); const recent = history.slice(-5); parts.push(`## Earlier Conversation (Summary)\n${oldSummary}`); parts.push(`## Recent Messages\n${formatMessages(recent)}`); } else { parts.push(`## Conversation\n${formatMessages(history)}`); } // END: Current query (high recency) // Restate critical requirements here parts.push(`## Current Request\n${components.currentQuery}`); // FINAL: Reminder of key constraints parts.push(`Remember: ${extractKeyConstraints(components.systemPrompt)}`); return parts.join('\n\n'); } ### Intelligent