
Embedding Strategies
Copy proven embedding client patterns (Voyage, OpenAI, domain models) when building RAG, search, or agent memory features.
Install
npx skills add https://github.com/wshobson/agents --skill embedding-strategiesWhat is this skill?
- Template 1: Voyage AI embeddings including voyage-3-large and Claude-oriented guidance
- Domain-specialized Voyage models: voyage-code-3, voyage-finance-2, voyage-law-2
- Template 2: OpenAI embeddings with text-embedding-3-small and optional dimension reduction
- Python patterns for batching large document lists (batch size 100 in OpenAI example)
- Separate query vs document embedding helpers for retrieval pipelines
Adoption & trust: 7.5k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Choosing and wiring embedding providers is canonical backend AI work during product build, though the same templates resurface when you iterate retrieval in grow/operate. The skill is code-first templates for embedding APIs, batching, and domain-specific models—backend integration for vector pipelines.
Common Questions / FAQ
Is Embedding Strategies safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Embedding Strategies
# embedding-strategies — templates and worked examples ## Templates ### Template 1: Voyage AI Embeddings (Recommended for Claude) ```python from langchain_voyageai import VoyageAIEmbeddings from typing import List import os # Initialize Voyage AI embeddings (recommended by Anthropic for Claude) embeddings = VoyageAIEmbeddings( model="voyage-3-large", voyage_api_key=os.environ.get("VOYAGE_API_KEY") ) def get_embeddings(texts: List[str]) -> List[List[float]]: """Get embeddings from Voyage AI.""" return embeddings.embed_documents(texts) def get_query_embedding(query: str) -> List[float]: """Get single query embedding.""" return embeddings.embed_query(query) # Specialized models for domains code_embeddings = VoyageAIEmbeddings(model="voyage-code-3") finance_embeddings = VoyageAIEmbeddings(model="voyage-finance-2") legal_embeddings = VoyageAIEmbeddings(model="voyage-law-2") ``` ### Template 2: OpenAI Embeddings ```python from openai import OpenAI from typing import List import numpy as np client = OpenAI() def get_embeddings( texts: List[str], model: str = "text-embedding-3-small", dimensions: int = None ) -> List[List[float]]: """Get embeddings from OpenAI with optional dimension reduction.""" # Handle batching for large lists batch_size = 100 all_embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i + batch_size] kwargs = {"input": batch, "model": model} if dimensions: # Matryoshka dimensionality reduction kwargs["dimensions"] = dimensions response = client.embeddings.create(**kwargs) embeddings = [item.embedding for item in response.data] all_embeddings.extend(embeddings) return all_embeddings def get_embedding(text: str, **kwargs) -> List[float]: """Get single embedding.""" return get_embeddings([text], **kwargs)[0] # Dimension reduction with Matryoshka embeddings def get_reduced_embedding(text: str, dimensions: int = 512) -> List[float]: """Get embedding with reduced dimensions (Matryoshka).""" return get_embedding( text, model="text-embedding-3-small", dimensions=dimensions ) ``` ### Template 3: Local Embeddings with Sentence Transformers ```python from sentence_transformers import SentenceTransformer from typing import List, Optional import numpy as np class LocalEmbedder: """Local embedding with sentence-transformers.""" def __init__( self, model_name: str = "BAAI/bge-large-en-v1.5", device: str = "cuda" ): self.model = SentenceTransformer(model_name, device=device) self.model_name = model_name def embed( self, texts: List[str], normalize: bool = True, show_progress: bool = False ) -> np.ndarray: """Embed texts with optional normalization.""" embeddings = self.model.encode( texts, normalize_embeddings=normalize, show_progress_bar=show_progress, convert_to_numpy=True ) return embeddings def embed_query(self, query: str) -> np.ndarray: """Embed a query with appropriate prefix for retrieval models.""" # BGE and similar models benefit from query prefix if "bge" in self.model_name.lower(): query = f"Represent this sentence for searching relevant passages: {query}" return self.embed([query])[0] def embed_documents(self, documents: List[str]) -> np.ndarray: """Embed documents for indexing.""" return self.embed(documents) # E5 model with instructions class E5Embedder: def __init__(self, model_name: str = "intfloat/multilingual-e5-large"): self.model = SentenceTransformer(model_name) def embed_query(self, query: str) -> np.ndarray: """E5 requires 'query:' prefix for queries.""" return self.model.encode(f"query: {query}") def embed_docume