
Context Optimization
Stretch effective context budget for long agent runs by compressing, masking, caching, and partitioning conversation state.
Overview
context-optimization is an agent skill most often used in Build (also Ship perf, Operate monitoring) that teaches compression, masking, caching, and partitioning to use limited context windows more effectively.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill context-optimizationWhat is this skill?
- Four strategies: compaction, observation masking, KV-cache optimization, context partitioning
- Targets cost and latency reduction by cutting low-signal tokens, not chasing larger models
- Guidance for long-running agents, large documents, and production-scale conversations
- Claims effective capacity can roughly double or triple without longer native context windows
- Emphasizes context quality over raw token count
- Four primary optimization strategies documented
- Effective context capacity can double or triple per skill guidance
Adoption & trust: 451 installs on skills.sh; 40.1k GitHub stars.
What problem does it solve?
Long agent sessions and fat tool logs exhaust the context window, raising cost and failure rates before the task finishes.
Who is it for?
Indie builders running multi-step coding agents, large-repo tasks, or chatty tool loops who need token economics without upgrading models.
Skip if: One-shot Q&A, tiny repos where full files fit trivially, or teams unwilling to maintain summarization and masking conventions.
When should I use this skill?
Context limits constrain task complexity, you need cost or latency reduction on long conversations, you implement long-running agent systems, or you handle larger documents at production scale.
What do I get? / Deliverables
You apply the four optimization strategies so more useful state fits the same window, improving completion rates on complex runs.
- Context optimization strategy applied to the session
- Compaction or masking plan for verbose tool output
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Build agent-tooling is the primary shelf because optimization techniques are applied while designing and implementing agent loops and prompts. Agent-tooling fits skills that govern how the agent manages tokens and memory—not generic frontend or backend feature work.
Where it fits
Design compaction rules before a 40-step refactor agent starts logging entire file reads.
Benchmark latency after masking verbose test output in CI-fix loops.
Track token spend per workflow and tighten partitioning when nightly jobs drift over budget.
Keep support-triage agents under window limits when attaching long ticket histories.
How it compares
Technique playbook for context engineering—not a RAG stack installer or automatic context-window expansion product.
Common Questions / FAQ
Who is context-optimization for?
Solo developers building or operating agentic workflows who hit context limits, latency, or token bills during long conversations or big documents.
When should I use context-optimization?
Use in build while designing agent memory, at ship when perf-testing long sessions, and at operate when production agents need stable cost per task.
Is context-optimization safe to install?
It is instructional skill content without inherent shell access; review the Security Audits panel on this page like any community-sourced skill.
SKILL.md
READMESKILL.md - Context Optimization
# Context Optimization Techniques Context optimization extends the effective capacity of limited context windows through strategic compression, masking, caching, and partitioning. The goal is not to magically increase context windows but to make better use of available capacity. Effective optimization can double or triple effective context capacity without requiring larger models or longer contexts. ## When to Use Activate this skill when: - Context limits constrain task complexity - Optimizing for cost reduction (fewer tokens = lower costs) - Reducing latency for long conversations - Implementing long-running agent systems - Needing to handle larger documents or conversations - Building production systems at scale ## Core Concepts Context optimization extends effective capacity through four primary strategies: compaction (summarizing context near limits), observation masking (replacing verbose outputs with references), KV-cache optimization (reusing cached computations), and context partitioning (splitting work across isolated contexts). The key insight is that context quality matters more than quantity. Optimization preserves signal while reducing noise. The art lies in selecting what to keep versus what to discard, and when to apply each technique. ## Detailed Topics ### Compaction Strategies **What is Compaction** Compaction is the practice of summarizing context contents when approaching limits, then reinitializing a new context window with the summary. This distills the contents of a context window in a high-fidelity manner, enabling the agent to continue with minimal performance degradation. Compaction typically serves as the first lever in context optimization. The art lies in selecting what to keep versus what to discard. **Compaction Implementation** Compaction works by identifying sections that can be compressed, generating summaries that capture essential points, and replacing full content with summaries. Priority for compression goes to tool outputs (replace with summaries), old turns (summarize early conversation), retrieved docs (summarize if recent versions exist), and never compress system prompt. **Summary Generation** Effective summaries preserve different elements depending on message type: Tool outputs: Preserve key findings, metrics, and conclusions. Remove verbose raw output. Conversational turns: Preserve key decisions, commitments, and context shifts. Remove filler and back-and-forth. Retrieved documents: Preserve key facts and claims. Remove supporting evidence and elaboration. ### Observation Masking **The Observation Problem** Tool outputs can comprise 80%+ of token usage in agent trajectories. Much of this is verbose output that has already served its purpose. Once an agent has used a tool output to make a decision, keeping the full output provides diminishing value while consuming significant context. Observation masking replaces verbose tool outputs with compact references. The information remains accessible if needed but does not consume context continuously. **Masking Strategy Selection** Not all observations should be masked equally: Never mask: Observations critical to current task, observations from the most recent turn, observations used in active reasoning. Consider masking: Observations from 3+ turns ago, verbose outputs with key points extractable, observations whose purpose has been served. Always mask: Repeated outputs, boilerplate headers/footers, outputs already summarized in conversation. ### KV-Cache Optimization **Understanding KV-Cache** The KV-cache stores Key and Value tensors computed during inference, growing linearly with sequenc