
Pinecone
Choose and deploy Pinecone indexes (serverless vs pods) with hybrid dense+sparse patterns when shipping RAG or semantic search in production.
Overview
Pinecone is an agent skill most often used in Operate (also Build integrations) that documents production Pinecone deployment patterns—serverless vs pod indexes and hybrid dense-plus-sparse upserts—for solo builders ship
Install
npx skills add https://github.com/davila7/claude-code-templates --skill pineconeWhat is this skill?
- Documents serverless index creation with ServerlessSpec (AWS, GCP, or Azure regions) and pay-per-usage auto-scaling
- Documents pod-based PodSpec with pod types, pod count, and replicas for predictable p95 latency and throughput
- Explains when to pick serverless (variable load, cost) vs pods (consistent production SLAs)
- Shows hybrid upsert pattern combining dense semantic vectors with sparse_values token indices for keyword-aware retrieva
- Includes copy-paste Python examples using the official Pinecone client API
- Compares 2 deployment models: serverless (ServerlessSpec) and pod-based (PodSpec)
- Hybrid upsert example uses dense values plus sparse_values with indices and TF-style weights
- ServerlessSpec supports 3 clouds: aws, gcp, and azure
Adoption & trust: 512 installs on skills.sh; 27.8k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a production Pinecone index but are unsure whether serverless or pods fit your latency, cost, and traffic profile, or how to store hybrid vectors correctly.
Who is it for?
Indie builders launching RAG, semantic search, or agent memory who already have embeddings and want clear serverless-versus-pod guidance with runnable Pinecone client examples.
Skip if: Teams that only need a local in-memory vector toy, have no network/API Pinecone access, or already standardized on a different vector database with no Pinecone in stack.
When should I use this skill?
When deploying Pinecone to production, choosing serverless versus pod indexes, or implementing hybrid dense plus sparse vector upserts.
What do I get? / Deliverables
You can create the right index spec, upsert dense and sparse vectors with working Python snippets, and align deployment choice with variable load versus consistent SLA needs before scaling traffic.
- Production-ready index creation snippet (serverless or pod)
- Hybrid vector upsert pattern for dense + sparse fields
- Documented tradeoff choice between cost-variable serverless and latency-stable pods
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The readme is framed as production deployment patterns—index specs, cloud regions, scaling, and latency tradeoffs—so the canonical shelf is Operate where vector search actually runs in prod. Infra is the right subphase because the skill compares ServerlessSpec vs PodSpec, replicas, and cloud/region placement rather than app UI or marketing work.
Where it fits
Wire your embedding pipeline and create a serverless index with the correct dimension and cosine metric before first upsert.
Model document IDs and hybrid sparse_values alongside dense vectors so keyword-heavy queries still retrieve the right chunks.
Pick cloud region and replica strategy before go-live so launch traffic does not surprise you with latency or cost spikes.
Re-evaluate serverless pay-per-use versus dedicated pods when p95 latency or throughput becomes a support issue.
How it compares
Use this as a procedural deployment playbook in SKILL.md—not a hosted MCP server that queries indexes on your behalf.
Common Questions / FAQ
Who is pinecone for?
Solo and indie builders shipping AI features (RAG, search, agents) who manage their own Pinecone account and need production index configuration guidance inside their coding agent.
When should I use pinecone?
Use it in Build when integrating embeddings and upserts; in Ship when hardening launch configs; and in Operate when tuning serverless vs pod capacity, regions, replicas, and hybrid dense+sparse retrieval under real load.
Is pinecone safe to install?
Treat it as documentation-oriented agent guidance: review the Security Audits panel on this Prism page, rotate API keys, and scope Pinecone credentials instead of pasting production secrets into chat logs.
SKILL.md
READMESKILL.md - Pinecone
# Pinecone Deployment Guide Production deployment patterns for Pinecone. ## Serverless vs Pod-based ### Serverless (Recommended) ```python from pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key="your-key") # Create serverless index pc.create_index( name="my-index", dimension=1536, metric="cosine", spec=ServerlessSpec( cloud="aws", # or "gcp", "azure" region="us-east-1" ) ) ``` **Benefits:** - Auto-scaling - Pay per usage - No infrastructure management - Cost-effective for variable load **Use when:** - Variable traffic - Cost optimization important - Don't need consistent latency ### Pod-based ```python from pinecone import PodSpec pc.create_index( name="my-index", dimension=1536, metric="cosine", spec=PodSpec( environment="us-east1-gcp", pod_type="p1.x1", # or p1.x2, p1.x4, p1.x8 pods=2, # Number of pods replicas=2 # High availability ) ) ``` **Benefits:** - Consistent performance - Predictable latency - Higher throughput - Dedicated resources **Use when:** - Production workloads - Need consistent p95 latency - High throughput required ## Hybrid search ### Dense + Sparse vectors ```python # Upsert with both dense and sparse vectors index.upsert(vectors=[ { "id": "doc1", "values": [0.1, 0.2, ...], # Dense (semantic) "sparse_values": { "indices": [10, 45, 123], # Token IDs "values": [0.5, 0.3, 0.8] # TF-IDF/BM25 scores }, "metadata": {"text": "..."} } ]) # Hybrid query results = index.query( vector=[0.1, 0.2, ...], # Dense query sparse_vector={ "indices": [10, 45], "values": [0.5, 0.3] }, top_k=10, alpha=0.5 # 0=sparse only, 1=dense only, 0.5=balanced ) ``` **Benefits:** - Best of both worlds - Semantic + keyword matching - Better recall than either alone ## Namespaces for multi-tenancy ```python # Separate data by user/tenant index.upsert( vectors=[{"id": "doc1", "values": [...]}], namespace="user-123" ) # Query specific namespace results = index.query( vector=[...], namespace="user-123", top_k=5 ) # List namespaces stats = index.describe_index_stats() print(stats['namespaces']) ``` **Use cases:** - Multi-tenant SaaS - User-specific data isolation - A/B testing (prod/staging namespaces) ## Metadata filtering ### Exact match ```python results = index.query( vector=[...], filter={"category": "tutorial"}, top_k=5 ) ``` ### Range queries ```python results = index.query( vector=[...], filter={"price": {"$gte": 100, "$lte": 500}}, top_k=5 ) ``` ### Complex filters ```python results = index.query( vector=[...], filter={ "$and": [ {"category": {"$in": ["tutorial", "guide"]}}, {"difficulty": {"$lte": 3}}, {"published": {"$gte": "2024-01-01"}} ] }, top_k=5 ) ``` ## Best practices 1. **Use serverless for development** - Cost-effective 2. **Switch to pods for production** - Consistent performance 3. **Implement namespaces** - Multi-tenancy 4. **Add metadata strategically** - Enable filtering 5. **Use hybrid search** - Better quality 6. **Batch upserts** - 100-200 vectors per batch 7. **Monitor usage** - Check Pinecone dashboard 8. **Set up alerts** - Usage/cost thresholds 9. **Regular backups** - Export important data 10. **Test filters** - Verify performance ## Resources - **Docs**: https://docs.pinecone.io - **Console**: https://app.pinecone.io --- name: pinecone description: Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure. version: 1.0.0 author: Orchestra Research license: MIT tags: [RAG, Pinecone, Vector Database, Managed S