
Llm App Patterns
Apply production-oriented RAG, agent, and LLMOps patterns when designing or hardening an LLM-powered product.
Overview
llm-app-patterns is an agent skill most often used in Build (also Ship perf, Grow analytics) that documents production-ready RAG, agent, and LLMOps patterns for LLM applications.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill llm-app-patternsWhat is this skill?
- Three-stage RAG pipeline: ingest documents, retrieve context, generate with LLM
- Document ingestion covers chunking strategies including fixed-size and richer approaches in the full skill
- Patterns framed for production LLM applications inspired by Dify and industry practice
- Guidance for AI agents with tools and LLMOps monitoring setup
- Helps choose between agent architectures when scope is unclear
- RAG overview diagrams three stages: ingest, retrieve, generate
Adoption & trust: 610 installs on skills.sh; 40.1k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are building an LLM feature but lack a concrete ingest-retrieve-generate layout, chunking choices, or agent architecture, so prototypes stay fragile.
Who is it for?
Indie builders implementing RAG chat, tool agents, or internal copilots who want Dify-inspired structure in agent sessions.
Skip if: Pure static sites with no model calls, or teams that only need a single prompt tweak with no retrieval or ops concerns.
When should I use this skill?
Designing LLM-powered applications, implementing RAG, building AI agents with tools, setting up LLMOps monitoring, or choosing agent architectures.
What do I get? / Deliverables
You leave with implementable pattern choices for RAG pipelines, tool agents, and monitoring aligned to production practice instead of one-off demos.
- Architecture notes for RAG ingest/retrieve/generate
- Chunking and agent pattern decisions documented for implementation
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build agent-tooling because the skill centers on architecture for LLM apps, RAG pipelines, and tool-using agents before you ship and monitor them. Agent-tooling is where solo builders pick retrieval, chunking, vector search, and agent loop designs—not generic frontend polish.
Where it fits
Map ingest-chunk-embed steps before implementing your first Pinecone or pgvector index.
Pick agent-with-tools loop structure for a support bot backed by your API.
Revisit chunk sizes and retrieval depth when latency spikes in production.
Align LLMOps dashboards with funnel metrics after launch.
How it compares
Pattern and architecture reference—not a hosted vector DB integration or MCP server by itself.
Common Questions / FAQ
Who is llm-app-patterns for?
Solo and small-team builders creating LLM-powered SaaS, agents, or APIs who need retrieval, tooling, and ops patterns in one skill.
When should I use llm-app-patterns?
Use during Build agent-tooling when designing RAG or agents, during Ship perf when tuning retrieval latency, and during Grow analytics when aligning LLMOps metrics with product loops.
Is llm-app-patterns safe to install?
Check this page’s Security Audits panel for community skill risk; the content is documentation and example code patterns—validate any snippet before production deploy.
SKILL.md
READMESKILL.md - Llm App Patterns
# 🤖 LLM Application Patterns > Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langgenius/dify) and industry best practices. ## When to Use This Skill Use this skill when: - Designing LLM-powered applications - Implementing RAG (Retrieval-Augmented Generation) - Building AI agents with tools - Setting up LLMOps monitoring - Choosing between agent architectures --- ## 1. RAG Pipeline Architecture ### Overview RAG (Retrieval-Augmented Generation) grounds LLM responses in your data. ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Ingest │────▶│ Retrieve │────▶│ Generate │ │ Documents │ │ Context │ │ Response │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌───────────┐ ┌───────────┐ │ Chunking│ │ Vector │ │ LLM │ │Embedding│ │ Search │ │ + Context│ └─────────┘ └───────────┘ └───────────┘ ``` ### 1.1 Document Ingestion ```python # Chunking strategies class ChunkingStrategy: # Fixed-size chunks (simple but may break context) FIXED_SIZE = "fixed_size" # e.g., 512 tokens # Semantic chunking (preserves meaning) SEMANTIC = "semantic" # Split on paragraphs/sections # Recursive splitting (tries multiple separators) RECURSIVE = "recursive" # ["\n\n", "\n", " ", ""] # Document-aware (respects structure) DOCUMENT_AWARE = "document_aware" # Headers, lists, etc. # Recommended settings CHUNK_CONFIG = { "chunk_size": 512, # tokens "chunk_overlap": 50, # token overlap between chunks "separators": ["\n\n", "\n", ". ", " "], } ``` ### 1.2 Embedding & Storage ```python # Vector database selection VECTOR_DB_OPTIONS = { "pinecone": { "use_case": "Production, managed service", "scale": "Billions of vectors", "features": ["Hybrid search", "Metadata filtering"] }, "weaviate": { "use_case": "Self-hosted, multi-modal", "scale": "Millions of vectors", "features": ["GraphQL API", "Modules"] }, "chromadb": { "use_case": "Development, prototyping", "scale": "Thousands of vectors", "features": ["Simple API", "In-memory option"] }, "pgvector": { "use_case": "Existing Postgres infrastructure", "scale": "Millions of vectors", "features": ["SQL integration", "ACID compliance"] } } # Embedding model selection EMBEDDING_MODELS = { "openai/text-embedding-3-small": { "dimensions": 1536, "cost": "$0.02/1M tokens", "quality": "Good for most use cases" }, "openai/text-embedding-3-large": { "dimensions": 3072, "cost": "$0.13/1M tokens", "quality": "Best for complex queries" }, "local/bge-large": { "dimensions": 1024, "cost": "Free (compute only)", "quality": "Comparable to OpenAI small" } } ``` ### 1.3 Retrieval Strategies ```python # Basic semantic search def semantic_search(query: str, top_k: int = 5): query_embedding = embed(query) results = vector_db.similarity_search( query_embedding, top_k=top_k ) return results # Hybrid search (semantic + keyword) def hybrid_search(query: str, top_k: int = 5, alpha: float = 0.5): """ alpha=1.0: Pure semantic alpha=0.0: Pure keyword (BM25) alpha=0.5: Balanced """ semantic_results = vector_db.similarity_search(query) keyword_results = bm25_search(query) # Reciprocal Rank Fusion return rrf_merge(semantic_results, keyword_results, alpha) # Multi-query retr