
Langfuse
Trace LLM calls, version prompts, score quality, and track cost once your agent app is in production.
Overview
Langfuse is an agent skill for the Operate phase that helps solo builders instrument LLM applications with tracing, prompt versioning, evaluation, and cost monitoring via the Langfuse platform.
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill langfuseWhat is this skill?
- Open-source LLM observability: traces, spans, and production-oriented metrics
- Prompt management, versioning, and A/B style prompt experiments
- Evaluation, scoring, and dataset workflows to catch quality regressions
- Integrations with LangChain, LlamaIndex, and OpenAI-style stacks
- Cost tracking and performance monitoring for token-heavy agent paths
Adoption & trust: 554 installs on skills.sh; 40.1k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your LLM feature works in dev but you cannot see which prompt version failed, what it cost, or why quality dropped in production.
Who is it for?
Indie builders running agents or chat features in production who need Langfuse-style traces, prompt history, and eval loops.
Skip if: Pre-product ideation with no API traffic yet, or teams wanting only static code review without runtime telemetry.
When should I use this skill?
Debugging, monitoring, or improving LLM applications in production with tracing, prompts, evaluation, and cost visibility.
What do I get? / Deliverables
You integrate Langfuse tracing and scoring so every critical model call is observable, prompts are versioned, and regressions show up in data—not user complaints.
- Instrumented traces and spans for key LLM paths
- Versioned prompts with evaluation or scoring hooks
- Cost and latency visibility dashboard or export workflow
Recommended Skills
Journey fit
Operate is where LLM apps need traces, regressions, and spend visibility—after you have something users actually hit. monitoring matches observability, latency, cost, and quality dashboards rather than initial model prompting in Build.
How it compares
Production LLM observability and prompt ops—not a multi-agent topology playbook or a generic application error tracker with no token semantics.
Common Questions / FAQ
Who is langfuse for?
Solo builders and small teams shipping LLM-backed SaaS, agents, or APIs who need tracing, prompt versioning, and evaluation in one observability stack.
When should I use langfuse?
Use it in Operate while monitoring production agents; in Ship when hardening launch checks with trace-backed smoke tests; and in Grow when analytics on prompt quality informs lifecycle tweaks.
Is langfuse safe to install?
The skill describes third-party observability APIs and app instrumentation—review credentials handling and the Security Audits panel on this Prism page before wiring production keys.
SKILL.md
READMESKILL.md - Langfuse
# Langfuse Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for debugging, monitoring, and improving LLM applications in production. **Role**: LLM Observability Architect You are an expert in LLM observability and evaluation. You think in terms of traces, spans, and metrics. You know that LLM applications need monitoring just like traditional software - but with different dimensions (cost, quality, latency). You use data to drive prompt improvements and catch regressions. ### Expertise - Tracing architecture - Prompt versioning - Evaluation strategies - Cost optimization - Quality monitoring ## Capabilities - LLM tracing and observability - Prompt management and versioning - Evaluation and scoring - Dataset management - Cost tracking - Performance monitoring - A/B testing prompts ## Prerequisites - 0: LLM application basics - 1: API integration experience - 2: Understanding of tracing concepts - Required skills: Python or TypeScript/JavaScript, Langfuse account (cloud or self-hosted), LLM API keys ## Scope - 0: Self-hosted requires infrastructure - 1: High-volume may need optimization - 2: Real-time dashboard has latency - 3: Evaluation requires setup ## Ecosystem ### Primary - Langfuse Cloud - Langfuse Self-hosted - Python SDK - JS/TS SDK ### Common_integrations - LangChain - LlamaIndex - OpenAI SDK - Anthropic SDK - Vercel AI SDK ### Platforms - Any Python/JS backend - Serverless functions - Jupyter notebooks ## Patterns ### Basic Tracing Setup Instrument LLM calls with Langfuse **When to use**: Any LLM application from langfuse import Langfuse # Initialize client langfuse = Langfuse( public_key="pk-...", secret_key="sk-...", host="https://cloud.langfuse.com" # or self-hosted URL ) # Create a trace for a user request trace = langfuse.trace( name="chat-completion", user_id="user-123", session_id="session-456", # Groups related traces metadata={"feature": "customer-support"}, tags=["production", "v2"] ) # Log a generation (LLM call) generation = trace.generation( name="gpt-4o-response", model="gpt-4o", model_parameters={"temperature": 0.7}, input={"messages": [{"role": "user", "content": "Hello"}]}, metadata={"attempt": 1} ) # Make actual LLM call response = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}] ) # Complete the generation with output generation.end( output=response.choices[0].message.content, usage={ "input": response.usage.prompt_tokens, "output": response.usage.completion_tokens } ) # Score the trace trace.score( name="user-feedback", value=1, # 1 = positive, 0 = negative comment="User clicked helpful" ) # Flush before exit (important in serverless) langfuse.flush() ### OpenAI Integration Automatic tracing with OpenAI SDK **When to use**: OpenAI-based applications from langfuse.openai import openai # Drop-in replacement for OpenAI client # All calls automatically traced response = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], # Langfuse-specific parameters name="greeting", # Trace name session_id="session-123", user_id="user-456", tags=["test"], metadata={"feature": "chat"} ) # Works with streaming stream = openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Tell me a story"}], stream=T