
Guidance
Configure Microsoft Guidance backends—Anthropic, OpenAI, and local models—with correct options when building constrained, template-driven LLM flows in Python.
Overview
Guidance is an agent skill for the Build phase that documents how to configure Guidance LLM backends (Anthropic, OpenAI, and local runtimes) for structured Python generation workflows.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill guidanceWhat is this skill?
- Anthropic Claude setup via models.Anthropic with env or explicit api_key and documented Sonnet, Opus, and Haiku model id
- Generation controls: max_tokens, temperature, top_p, timeout, max_retries on API backends
- Coverage of API-based and local backends (Transformers, llama.cpp) with comparison and performance tuning sections
- Context manager patterns for switching active Guidance language models inside multi-step templates
- Documented Anthropic configuration includes max_retries default pattern of 3
- Guide sections cover API-based models, local models, backend comparison, performance tuning, and advanced configuration
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are adding Guidance to an agent pipeline but are unsure which model constructor, API key pattern, and tuning knobs match your target Claude or local backend.
Who is it for?
Indie builders implementing constrained LLM outputs in Python who need a quick backend matrix before committing to one provider.
Skip if: Non-Python stacks, pure prompt-only chat with no Guidance templates, or production monitoring setup with no model integration work yet.
When should I use this skill?
You are implementing or refactoring Guidance Python code and need backend-specific constructor options and model id examples.
What do I get? / Deliverables
You have copy-ready Python setup for chosen backends with model ids, auth, and generation parameters aligned to your Guidance templates.
- Backend-specific Guidance model initialization snippets
- Tuning notes for latency, retries, and local vs API tradeoffs
Recommended Skills
Journey fit
Build → agent-tooling is the primary shelf because the skill documents how to wire Guidance model objects and generation constraints into agent pipelines. Backend selection, API keys, timeouts, and performance tuning are integration setup tasks for agent runtimes—not generic product docs or ship-time QA.
How it compares
Reference skill for the Guidance Python library—not a hosted MCP server or a no-code prompt UI.
Common Questions / FAQ
Who is guidance for?
Python-focused solo builders and researchers wiring Guidance templates to Claude, OpenAI, or local models inside custom agents and evaluation harnesses.
When should I use guidance?
During Build agent-tooling when selecting models.Anthropic or local backends, setting API keys, and tuning max_tokens, temperature, and retries before shipping structured generation.
Is guidance safe to install?
Following the guide implies API keys and network calls to model providers—review the Security Audits panel on this page and never commit secrets into repos your agent can read.
SKILL.md
READMESKILL.md - Guidance
# Backend Configuration Guide Complete guide to configuring Guidance with different LLM backends. ## Table of Contents - API-Based Models (Anthropic, OpenAI) - Local Models (Transformers, llama.cpp) - Backend Comparison - Performance Tuning - Advanced Configuration ## API-Based Models ### Anthropic Claude #### Basic Setup ```python from guidance import models # Using environment variable lm = models.Anthropic("claude-sonnet-4-5-20250929") # Reads ANTHROPIC_API_KEY from environment # Explicit API key lm = models.Anthropic( model="claude-sonnet-4-5-20250929", api_key="your-api-key-here" ) ``` #### Available Models ```python # Claude 3.5 Sonnet (Latest, recommended) lm = models.Anthropic("claude-sonnet-4-5-20250929") # Claude 3.7 Sonnet (Fast, cost-effective) lm = models.Anthropic("claude-sonnet-3.7-20250219") # Claude 3 Opus (Most capable) lm = models.Anthropic("claude-3-opus-20240229") # Claude 3.5 Haiku (Fastest, cheapest) lm = models.Anthropic("claude-3-5-haiku-20241022") ``` #### Configuration Options ```python lm = models.Anthropic( model="claude-sonnet-4-5-20250929", api_key="your-api-key", max_tokens=4096, # Max tokens to generate temperature=0.7, # Sampling temperature (0-1) top_p=0.9, # Nucleus sampling timeout=30, # Request timeout (seconds) max_retries=3 # Retry failed requests ) ``` #### With Context Managers ```python from guidance import models, system, user, assistant, gen lm = models.Anthropic("claude-sonnet-4-5-20250929") with system(): lm += "You are a helpful assistant." with user(): lm += "What is the capital of France?" with assistant(): lm += gen(max_tokens=50) print(lm) ``` ### OpenAI #### Basic Setup ```python from guidance import models # Using environment variable lm = models.OpenAI("gpt-4o") # Reads OPENAI_API_KEY from environment # Explicit API key lm = models.OpenAI( model="gpt-4o", api_key="your-api-key-here" ) ``` #### Available Models ```python # GPT-4o (Latest, multimodal) lm = models.OpenAI("gpt-4o") # GPT-4o Mini (Fast, cost-effective) lm = models.OpenAI("gpt-4o-mini") # GPT-4 Turbo lm = models.OpenAI("gpt-4-turbo") # GPT-3.5 Turbo (Cheapest) lm = models.OpenAI("gpt-3.5-turbo") ``` #### Configuration Options ```python lm = models.OpenAI( model="gpt-4o-mini", api_key="your-api-key", max_tokens=2048, temperature=0.7, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0, timeout=30 ) ``` #### Chat Format ```python from guidance import models, gen lm = models.OpenAI("gpt-4o-mini") # OpenAI uses chat format lm += [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"} ] # Generate response lm += gen(max_tokens=50) ``` ### Azure OpenAI ```python from guidance import models lm = models.AzureOpenAI( model="gpt-4o", azure_endpoint="https://your-resource.openai.azure.com/", api_key="your-azure-api-key", api_version="2024-02-15-preview", deployment_name="your-deployment-name" ) ``` ## Local Models ### Transformers (Hugging Face) #### Basic Setup ```python from guidance.models import Transformers # Load model from Hugging Face lm = Transformers("microsoft/Phi-4-mini-instruct") ``` #### GPU Configuration ```python # Use GPU lm = Transformers( "microsoft/Phi-4-mini-instruct", device="cuda" ) # Use specific GPU lm = Transformers( "microsoft/Phi-4-mini-instruct", device="cuda:0" # GPU 0 ) # Use CPU lm = Transformers( "microsoft/Phi-4-mini-instruct", device="cpu" ) ``` #### Advanced Configuration ```python lm = Transformers( "microsoft/Phi-4-mini-instruct", device="cuda", torch_dtype="float16", # Use FP16 (faster, less memory) load_in_8bit=True, # 8-bit quantization max_memory={0: "20GB"}, # GPU memory limit offload_folder="./offload" # Offload to disk if nee