
Prompt Caching
Cut repeated LLM token cost and latency by applying Anthropic prompt caching, response caching, and CAG patterns in agent apps.
Overview
Prompt Caching is an agent skill most often used in Build (also Operate) that teaches Anthropic prompt caching, response caching, and CAG patterns for LLM APIs.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill prompt-cachingWhat is this skill?
- Covers prompt-cache, response-cache, kv-cache, cag-patterns, and cache-invalidation capabilities
- Anthropic pattern: stable system blocks as cacheable message parts in Claude API
- Primary tools: Anthropic Prompt Caching, Redis, OpenAI automatic caching
- Explicit scope boundary: LLM prompt/response only—not CDN or static assets
- Recommends context-window-management as a companion skill
- 5 named capabilities: prompt-cache through cache-invalidation
- 3 primary tools: Anthropic Prompt Caching, Redis, OpenAI Caching
Adoption & trust: 643 installs on skills.sh; 40.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You resend the same long system prompt and context on every agent call and watch latency and API bills climb.
Who is it for?
Indie builders running Claude or OpenAI in production agents with repeated system instructions or retrieved document prefixes.
Skip if: Static site performance tuning, SQL query optimization, or teams with no recurring prompt structure to cache.
When should I use this skill?
Using Claude API with stable system prompts or context and needing caching fundamentals for LLM APIs.
What do I get? / Deliverables
You structure prompts and side caches so stable prefixes hit provider or Redis caches and invalidation stays explicit under changing RAG context.
- Caching architecture choice (prompt vs response vs CAG)
- Implementation sketch for provider cache blocks
- Invalidation policy notes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Agent-tooling is the primary shelf because caching is configured while wiring Claude/OpenAI clients; the same patterns matter again in Operate when traffic grows. Agent-tooling covers prompt assembly, KV-style response stores, and cache invalidation tied to model APIs—not generic CDN or database query caching.
Where it fits
Split stable system instructions into cacheable blocks before wiring user queries in Claude Sonnet.
Add Redis keyed by prompt hash for identical support-bot answers across sessions.
Define invalidation when embedded knowledge base documents change in production.
Compare cached versus uncached token lines to justify caching investment.
How it compares
Knowledge skill for LLM cache design—not a drop-in MCP server or CDN configuration guide.
Common Questions / FAQ
Who is prompt-caching for?
Developers building LLM-powered agents or APIs who understand basic hashing and API usage and want provider-native and Redis response caching patterns.
When should I use prompt-caching?
During build while integrating Claude or OpenAI clients with stable system prompts, and during operate when you need cache invalidation and cost control under real traffic.
Is prompt-caching safe to install?
It is documentation and code-pattern guidance with risk: none in metadata; review the Security Audits panel on this page before wiring Redis or API keys in your repo.
Workflow Chain
Requires first: context window management
SKILL.md
READMESKILL.md - Prompt Caching
# Prompt Caching Caching strategies for LLM prompts including Anthropic prompt caching, response caching, and CAG (Cache Augmented Generation) ## Capabilities - prompt-cache - response-cache - kv-cache - cag-patterns - cache-invalidation ## Prerequisites - Knowledge: Caching fundamentals, LLM API usage, Hash functions - Skills_recommended: context-window-management ## Scope - Does_not_cover: CDN caching, Database query caching, Static asset caching - Boundaries: Focus is LLM-specific caching, Covers prompt and response caching ## Ecosystem ### Primary_tools - Anthropic Prompt Caching - Native prompt caching in Claude API - Redis - In-memory cache for responses - OpenAI Caching - Automatic caching in OpenAI API ## Patterns ### Anthropic Prompt Caching Use Claude's native prompt caching for repeated prefixes **When to use**: Using Claude API with stable system prompts or context import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic(); // Cache the stable parts of your prompt async function queryWithCaching(userQuery: string) { const response = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 1024, system: [ { type: "text", text: LONG_SYSTEM_PROMPT, // Your detailed instructions cache_control: { type: "ephemeral" } // Cache this! }, { type: "text", text: KNOWLEDGE_BASE, // Large static context cache_control: { type: "ephemeral" } } ], messages: [ { role: "user", content: userQuery } // Dynamic part ] }); // Check cache usage console.log(`Cache read: ${response.usage.cache_read_input_tokens}`); console.log(`Cache write: ${response.usage.cache_creation_input_tokens}`); return response; } // Cost savings: 90% reduction on cached tokens // Latency savings: Up to 2x faster ### Response Caching Cache full LLM responses for identical or similar queries **When to use**: Same queries asked repeatedly import { createHash } from 'crypto'; import Redis from 'ioredis'; const redis = new Redis(process.env.REDIS_URL); class ResponseCache { private ttl = 3600; // 1 hour default // Exact match caching async getCached(prompt: string): Promise<string | null> { const key = this.hashPrompt(prompt); return await redis.get(`response:${key}`); } async setCached(prompt: string, response: string): Promise<void> { const key = this.hashPrompt(prompt); await redis.set(`response:${key}`, response, 'EX', this.ttl); } private hashPrompt(prompt: string): string { return createHash('sha256').update(prompt).digest('hex'); } // Semantic similarity caching async getSemanticallySimilar( prompt: string, threshold: number = 0.95 ): Promise<string | null> { const embedding = await embed(prompt); const similar = await this.vectorCache.search(embedding, 1); if (similar.length && similar[0].similarity > threshold) { return await redis.get(`response:${similar[0].id}`); } return null; } // Temperature-aware caching async getCachedWithParams( prompt: string, params: { temperature: number; model: string } ): Promise<string | null> { // Only cache low-temperature responses if (params.temperature > 0.5) return null; const key = this.hashPrompt( `${prompt}|${params.model}|${params.temperature}` ); return await redis.get(`response:${key}`); } } ### Cache Augmented Generation (CAG) Pre-cache