Pinecone

Name: Pinecone
Author: davila7

davila7/claude-code-templates

529 installs
29.9k repo stars
Updated July 27, 2026
davila7/claude-code-templates

Pinecone is a Claude Code skill that walks developers through choosing serverless versus pod-based Pinecone indexes and deploying hybrid dense+sparse vector patterns for production RAG and semantic search.

About

Pinecone is an agent skill that teaches developers how to deploy and operate Pinecone vector indexes in production using the official Python client. It walks through creating serverless indexes with ServerlessSpec across AWS, GCP, or Azure regions—ideal when traffic is spiky and you want auto-scaling without managing pods. It also covers pod-based PodSpec setups with pod types, replica counts, and dedicated capacity when you need stable latency and higher query throughput. The guide adds hybrid search by upserting dense embeddings alongside sparse token indices so retrieval can blend semantic similarity with lexical signals. Use it while wiring RAG, agent memory, or product search in build and integrations, then revisit at ship and operate when you promote an index from dev keys to real traffic, regions, and cost controls.

Documents serverless index creation with ServerlessSpec (AWS, GCP, or Azure regions) and pay-per-usage auto-scaling
Documents pod-based PodSpec with pod types, pod count, and replicas for predictable p95 latency and throughput
Explains when to pick serverless (variable load, cost) vs pods (consistent production SLAs)
Shows hybrid upsert pattern combining dense semantic vectors with sparse_values token indices for keyword-aware retrieva
Includes copy-paste Python examples using the official Pinecone client API

Pinecone by the numbers

529 all-time installs (skills.sh)
Ranked #364 of 1,041 Cloud & Infrastructure skills by installs in the Skillselion catalog
Security screen: HIGH risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/davila7/claude-code-templates --skill pinecone

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/davila7/claude-code-templates/pinecone.svg)](https://skillselion.com/skills/davila7/claude-code-templates/pinecone)

Installs	529
repo stars	★ 29.9k
Security audit	2 / 3 scanners passed
Last updated	July 27, 2026
Repository	davila7/claude-code-templates ↗

How do you deploy Pinecone indexes for production RAG?

Choose and deploy Pinecone indexes (serverless vs pods) with hybrid dense+sparse patterns when shipping RAG or semantic search in production.

Who is it for?

Backend developers shipping RAG, semantic search, or agent memory who must pick Pinecone serverless versus pod indexes and wire hybrid retrieval correctly.

Skip if: Teams still evaluating whether vector search is needed, or projects using only local embeddings without a managed Pinecone account.

When should I use this skill?

A developer asks to create, migrate, or optimize Pinecone indexes for RAG, semantic search, or hybrid dense+sparse retrieval in production.

What you get

Configured Pinecone index specs, Python create_index snippets, and a serverless-versus-pods decision record for your RAG stack.

create_index Python snippets
serverless-versus-pods decision notes
hybrid retrieval configuration outline

By the numbers

Example index configuration uses 1536-dimension vectors with cosine metric
ServerlessSpec supports AWS, GCP, and Azure cloud regions

Files

SKILL.mdMarkdownGitHub ↗

Pinecone - Managed Vector Database

The vector database for production AI applications.

When to use Pinecone

Use when:

Need managed, serverless vector database
Production RAG applications
Auto-scaling required
Low latency critical (<100ms)
Don't want to manage infrastructure
Need hybrid search (dense + sparse vectors)

Metrics:

Fully managed SaaS
Auto-scales to billions of vectors
p95 latency <100ms
99.9% uptime SLA

Use alternatives instead:

Chroma: Self-hosted, open-source
FAISS: Offline, pure similarity search
Weaviate: Self-hosted with more features

Quick start

Installation

pip install pinecone-client

Basic usage

from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="my-index",
    dimension=1536,  # Must match embedding dimension
    metric="cosine",  # or "euclidean", "dotproduct"
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

# Connect to index
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
    {"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
])

# Query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    include_metadata=True
)

print(results["matches"])

Core operations

Create index

# Serverless (recommended)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",         # or "gcp", "azure"
        region="us-east-1"
    )
)

# Pod-based (for consistent performance)
from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1"
    )
)

Upsert vectors

# Single upsert
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # 1536 dimensions
        "metadata": {
            "text": "Document content",
            "category": "tutorial",
            "timestamp": "2025-01-01"
        }
    }
])

# Batch upsert (recommended)
vectors = [
    {"id": f"vec{i}", "values": embedding, "metadata": metadata}
    for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
]

index.upsert(vectors=vectors, batch_size=100)

Query vectors

# Basic query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=10,
    include_metadata=True,
    include_values=False
)

# With metadata filtering
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    filter={"category": {"$eq": "tutorial"}}
)

# Namespace query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    namespace="production"
)

# Access results
for match in results["matches"]:
    print(f"ID: {match['id']}")
    print(f"Score: {match['score']}")
    print(f"Metadata: {match['metadata']}")

Metadata filtering

# Exact match
filter = {"category": "tutorial"}

# Comparison
filter = {"price": {"$gte": 100}}  # $gt, $gte, $lt, $lte, $ne

# Logical operators
filter = {
    "$and": [
        {"category": "tutorial"},
        {"difficulty": {"$lte": 3}}
    ]
}  # Also: $or

# In operator
filter = {"tags": {"$in": ["python", "ml"]}}

Namespaces

# Partition data by namespace
index.upsert(
    vectors=[{"id": "vec1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])

Hybrid search (dense + sparse)

# Upsert with sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense vector
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=5,
    alpha=0.5  # 0=sparse, 1=dense, 0.5=hybrid
)

LangChain integration

from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

# Create vector store
vectorstore = PineconeVectorStore.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    index_name="my-index"
)

# Query
results = vectorstore.similarity_search("query", k=5)

# With metadata filter
results = vectorstore.similarity_search(
    "query",
    k=5,
    filter={"category": "tutorial"}
)

# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

LlamaIndex integration

from llama_index.vector_stores.pinecone import PineconeVectorStore

# Connect to Pinecone
pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")

# Create vector store
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# Use in LlamaIndex
from llama_index.core import StorageContext, VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Index management

# List indices
indexes = pc.list_indexes()

# Describe index
index_info = pc.describe_index("my-index")
print(index_info)

# Get index stats
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Namespaces: {stats['namespaces']}")

# Delete index
pc.delete_index("my-index")

Delete vectors

# Delete by ID
index.delete(ids=["vec1", "vec2"])

# Delete by filter
index.delete(filter={"category": "old"})

# Delete all in namespace
index.delete(delete_all=True, namespace="test")

# Delete entire index
index.delete(delete_all=True)

Best practices

1. Use serverless - Auto-scaling, cost-effective 2. Batch upserts - More efficient (100-200 per batch) 3. Add metadata - Enable filtering 4. Use namespaces - Isolate data by user/tenant 5. Monitor usage - Check Pinecone dashboard 6. Optimize filters - Index frequently filtered fields 7. Test with free tier - 1 index, 100K vectors free 8. Use hybrid search - Better quality 9. Set appropriate dimensions - Match embedding model 10. Regular backups - Export important data

Performance

Operation	Latency	Notes
Upsert	~50-100ms	Per batch
Query (p50)	~50ms	Depends on index size
Query (p95)	~100ms	SLA target
Metadata filter	~+10-20ms	Additional overhead

Pricing (as of 2025)

Serverless:

$0.096 per million read units
$0.06 per million write units
$0.06 per GB storage/month

Free tier:

1 serverless index
100K vectors (1536 dimensions)
Great for prototyping

Resources

Website: https://www.pinecone.io
Docs: https://docs.pinecone.io
Console: https://app.pinecone.io
Pricing: https://www.pinecone.io/pricing

Pinecone Deployment Guide

Production deployment patterns for Pinecone.

Serverless vs Pod-based

Serverless (Recommended)

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-key")

# Create serverless index
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",  # or "gcp", "azure"
        region="us-east-1"
    )
)

Benefits:

Auto-scaling
Pay per usage
No infrastructure management
Cost-effective for variable load

Use when:

Variable traffic
Cost optimization important
Don't need consistent latency

Pod-based

from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",  # or p1.x2, p1.x4, p1.x8
        pods=2,  # Number of pods
        replicas=2  # High availability
    )
)

Benefits:

Consistent performance
Predictable latency
Higher throughput
Dedicated resources

Use when:

Production workloads
Need consistent p95 latency
High throughput required

Hybrid search

Dense + Sparse vectors

# Upsert with both dense and sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense (semantic)
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF/BM25 scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],  # Dense query
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=10,
    alpha=0.5  # 0=sparse only, 1=dense only, 0.5=balanced
)

Benefits:

Best of both worlds
Semantic + keyword matching
Better recall than either alone

Namespaces for multi-tenancy

# Separate data by user/tenant
index.upsert(
    vectors=[{"id": "doc1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])

Use cases:

Multi-tenant SaaS
User-specific data isolation
A/B testing (prod/staging namespaces)

Metadata filtering

Exact match

results = index.query(
    vector=[...],
    filter={"category": "tutorial"},
    top_k=5
)

Range queries

results = index.query(
    vector=[...],
    filter={"price": {"$gte": 100, "$lte": 500}},
    top_k=5
)

Complex filters

results = index.query(
    vector=[...],
    filter={
        "$and": [
            {"category": {"$in": ["tutorial", "guide"]}},
            {"difficulty": {"$lte": 3}},
            {"published": {"$gte": "2024-01-01"}}
        ]
    },
    top_k=5
)

Best practices

1. Use serverless for development - Cost-effective 2. Switch to pods for production - Consistent performance 3. Implement namespaces - Multi-tenancy 4. Add metadata strategically - Enable filtering 5. Use hybrid search - Better quality 6. Batch upserts - 100-200 vectors per batch 7. Monitor usage - Check Pinecone dashboard 8. Set up alerts - Usage/cost thresholds 9. Regular backups - Export important data 10. Test filters - Verify performance

Resources

Docs: https://docs.pinecone.io
Console: https://app.pinecone.io

Related skills

Azure AiIntegrates Azure AI Content Safety, Document Intelligence, Speech, and Search services into Java-based agents and applications.479k1.3k

Azure PrepareGenerate the exact Azure infrastructure files, Dockerfiles, and azure.yaml configuration needed before deploying any new or modernized application.479k1.3k

Azure StorageConnect agents and applications to Azure Blob Storage, File Shares, Queues, Tables, and Data Lake without leaving the coding environment.478k1.3k

Appinsights InstrumentationAutomatically instrument web applications running on Azure App Service with Application Insights for observability without manual configuration.478k1.3k

Azure Resource LookupInstantly list, query, and discover any Azure resources across subscriptions without leaving the agent chat.478k1.3k

Azure AigatewayConfigure Azure API Management as a secure, governed gateway for routing traffic to LLMs, MCP servers, and agent tools.478k1.3k

How it compares

Pick this skill when you already chose Pinecone and need index-tier and hybrid retrieval configuration, not when comparing vector databases from scratch.

FAQ

When should developers choose Pinecone serverless over pods?

Pinecone serverless indexes suit variable traffic, pay-per-usage cost control, and teams that skip infrastructure management. Pod-based indexes fit workloads needing consistent latency when traffic is steady and predictable.

What embedding setup does the Pinecone skill assume?

The Pinecone skill examples use 1536-dimension vectors with cosine similarity, matching common OpenAI embedding sizes. Developers adjust dimension and metric to match their chosen embedding model.

Is Pinecone safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Cloud & Infrastructureagentsllmautomation

About

Pinecone by the numbers

Add your badge

How do you deploy Pinecone indexes for production RAG?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Pinecone - Managed Vector Database

When to use Pinecone

Quick start

Installation

Basic usage

Core operations

Create index

Upsert vectors

Query vectors

Metadata filtering

Namespaces

Hybrid search (dense + sparse)

LangChain integration

LlamaIndex integration

Index management

Delete vectors

Best practices

Performance

Pricing (as of 2025)

Resources

Pinecone Deployment Guide

Serverless vs Pod-based

Serverless (Recommended)

Pod-based

Hybrid search

Dense + Sparse vectors

Namespaces for multi-tenancy

Metadata filtering

Exact match

Range queries

Complex filters

Best practices

Resources

Related skills

How it compares

FAQ

When should developers choose Pinecone serverless over pods?

What embedding setup does the Pinecone skill assume?

Is Pinecone safe to install?

This week in AI coding