Pinecone

Name: Pinecone
Author: orchestra-research

orchestra-research/ai-research-skills

394 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

pinecone is a Claude Code skill that wires a managed Pinecone index into production RAG or semantic search for developers who need hybrid vectors, metadata filters, and serverless scaling without self-hosting vector infr

About

pinecone is a version 1.0.0 Claude Code skill from orchestra-research/ai-research-skills that teaches agents to provision and query Pinecone's managed vector database for production AI workloads. The skill covers serverless auto-scaling indexes, hybrid search combining dense and sparse vectors, metadata filtering, namespaces, and sub-100ms p95 latency targets suited to RAG pipelines and recommendation systems. It declares a pinecone-client dependency and MIT licensing guidance so integrations stay aligned with Orchestra Research conventions. Developers reach for pinecone when semantic search or retrieval must run on managed infrastructure instead of local embeddings stores or DIY vector hosts.

Managed serverless Pinecone with auto-scaling toward billion-vector scale
Hybrid search combining dense and sparse vectors plus metadata filtering and namespaces
Documented **p95 latency under 100ms** and 99.9% uptime SLA positioning
Quick start with `pinecone-client` install and ServerlessSpec index creation
Guidance on when to prefer self-hosted Chroma, FAISS, or Weaviate instead

Pinecone by the numbers

394 all-time installs (skills.sh)
+35 installs in the week ending Jul 18, 2026 (Skillselion tracking)
Ranked #1,957 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: HIGH risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill pinecone

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/pinecone.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/pinecone)

Installs	394
repo stars	★ 11.2k
Security audit	2 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

How do you integrate Pinecone for production RAG?

Wire a managed Pinecone index into production RAG or semantic search with hybrid vectors, metadata filters, and serverless scaling.

Who is it for?

Backend developers shipping production RAG, semantic search, or recommendation features on managed Pinecone infrastructure.

Skip if: Local-only prototypes or teams that require fully self-hosted vector databases without managed services.

When should I use this skill?

A developer asks to add Pinecone vector search, hybrid retrieval, metadata filters, or serverless index scaling to an AI app.

What you get

Pinecone index configuration, pinecone-client query code, and hybrid search setup with metadata filters and namespaces.

Pinecone index setup
hybrid retrieval query code

By the numbers

Skill version 1.0.0 with pinecone-client dependency
Documents sub-100ms p95 latency for managed Pinecone queries

Files

SKILL.mdMarkdownGitHub ↗

Pinecone - Managed Vector Database

The vector database for production AI applications.

When to use Pinecone

Use when:

Need managed, serverless vector database
Production RAG applications
Auto-scaling required
Low latency critical (<100ms)
Don't want to manage infrastructure
Need hybrid search (dense + sparse vectors)

Metrics:

Fully managed SaaS
Auto-scales to billions of vectors
p95 latency <100ms
99.9% uptime SLA

Use alternatives instead:

Chroma: Self-hosted, open-source
FAISS: Offline, pure similarity search
Weaviate: Self-hosted with more features

Quick start

Installation

pip install pinecone-client

Basic usage

from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="my-index",
    dimension=1536,  # Must match embedding dimension
    metric="cosine",  # or "euclidean", "dotproduct"
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

# Connect to index
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
    {"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
])

# Query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    include_metadata=True
)

print(results["matches"])

Core operations

Create index

# Serverless (recommended)
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",         # or "gcp", "azure"
        region="us-east-1"
    )
)

# Pod-based (for consistent performance)
from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1"
    )
)

Upsert vectors

# Single upsert
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # 1536 dimensions
        "metadata": {
            "text": "Document content",
            "category": "tutorial",
            "timestamp": "2025-01-01"
        }
    }
])

# Batch upsert (recommended)
vectors = [
    {"id": f"vec{i}", "values": embedding, "metadata": metadata}
    for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
]

index.upsert(vectors=vectors, batch_size=100)

Query vectors

# Basic query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=10,
    include_metadata=True,
    include_values=False
)

# With metadata filtering
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    filter={"category": {"$eq": "tutorial"}}
)

# Namespace query
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=5,
    namespace="production"
)

# Access results
for match in results["matches"]:
    print(f"ID: {match['id']}")
    print(f"Score: {match['score']}")
    print(f"Metadata: {match['metadata']}")

Metadata filtering

# Exact match
filter = {"category": "tutorial"}

# Comparison
filter = {"price": {"$gte": 100}}  # $gt, $gte, $lt, $lte, $ne

# Logical operators
filter = {
    "$and": [
        {"category": "tutorial"},
        {"difficulty": {"$lte": 3}}
    ]
}  # Also: $or

# In operator
filter = {"tags": {"$in": ["python", "ml"]}}

Namespaces

# Partition data by namespace
index.upsert(
    vectors=[{"id": "vec1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])

Hybrid search (dense + sparse)

# Upsert with sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense vector
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=5,
    alpha=0.5  # 0=sparse, 1=dense, 0.5=hybrid
)

LangChain integration

from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings

# Create vector store
vectorstore = PineconeVectorStore.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    index_name="my-index"
)

# Query
results = vectorstore.similarity_search("query", k=5)

# With metadata filter
results = vectorstore.similarity_search(
    "query",
    k=5,
    filter={"category": "tutorial"}
)

# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

LlamaIndex integration

from llama_index.vector_stores.pinecone import PineconeVectorStore

# Connect to Pinecone
pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")

# Create vector store
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# Use in LlamaIndex
from llama_index.core import StorageContext, VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Index management

# List indices
indexes = pc.list_indexes()

# Describe index
index_info = pc.describe_index("my-index")
print(index_info)

# Get index stats
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Namespaces: {stats['namespaces']}")

# Delete index
pc.delete_index("my-index")

Delete vectors

# Delete by ID
index.delete(ids=["vec1", "vec2"])

# Delete by filter
index.delete(filter={"category": "old"})

# Delete all in namespace
index.delete(delete_all=True, namespace="test")

# Delete entire index
index.delete(delete_all=True)

Best practices

1. Use serverless - Auto-scaling, cost-effective 2. Batch upserts - More efficient (100-200 per batch) 3. Add metadata - Enable filtering 4. Use namespaces - Isolate data by user/tenant 5. Monitor usage - Check Pinecone dashboard 6. Optimize filters - Index frequently filtered fields 7. Test with free tier - 1 index, 100K vectors free 8. Use hybrid search - Better quality 9. Set appropriate dimensions - Match embedding model 10. Regular backups - Export important data

Performance

Operation	Latency	Notes
Upsert	~50-100ms	Per batch
Query (p50)	~50ms	Depends on index size
Query (p95)	~100ms	SLA target
Metadata filter	~+10-20ms	Additional overhead

Pricing (as of 2025)

Serverless:

$0.096 per million read units
$0.06 per million write units
$0.06 per GB storage/month

Free tier:

1 serverless index
100K vectors (1536 dimensions)
Great for prototyping

Resources

Website: https://www.pinecone.io
Docs: https://docs.pinecone.io
Console: https://app.pinecone.io
Pricing: https://www.pinecone.io/pricing

Pinecone Deployment Guide

Production deployment patterns for Pinecone.

Serverless vs Pod-based

Serverless (Recommended)

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-key")

# Create serverless index
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",  # or "gcp", "azure"
        region="us-east-1"
    )
)

Benefits:

Auto-scaling
Pay per usage
No infrastructure management
Cost-effective for variable load

Use when:

Variable traffic
Cost optimization important
Don't need consistent latency

Pod-based

from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",  # or p1.x2, p1.x4, p1.x8
        pods=2,  # Number of pods
        replicas=2  # High availability
    )
)

Benefits:

Consistent performance
Predictable latency
Higher throughput
Dedicated resources

Use when:

Production workloads
Need consistent p95 latency
High throughput required

Hybrid search

Dense + Sparse vectors

# Upsert with both dense and sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense (semantic)
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF/BM25 scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],  # Dense query
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=10,
    alpha=0.5  # 0=sparse only, 1=dense only, 0.5=balanced
)

Benefits:

Best of both worlds
Semantic + keyword matching
Better recall than either alone

Namespaces for multi-tenancy

# Separate data by user/tenant
index.upsert(
    vectors=[{"id": "doc1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])

Use cases:

Multi-tenant SaaS
User-specific data isolation
A/B testing (prod/staging namespaces)

Metadata filtering

Exact match

results = index.query(
    vector=[...],
    filter={"category": "tutorial"},
    top_k=5
)

Range queries

results = index.query(
    vector=[...],
    filter={"price": {"$gte": 100, "$lte": 500}},
    top_k=5
)

Complex filters

results = index.query(
    vector=[...],
    filter={
        "$and": [
            {"category": {"$in": ["tutorial", "guide"]}},
            {"difficulty": {"$lte": 3}},
            {"published": {"$gte": "2024-01-01"}}
        ]
    },
    top_k=5
)

Best practices

1. Use serverless for development - Cost-effective 2. Switch to pods for production - Consistent performance 3. Implement namespaces - Multi-tenancy 4. Add metadata strategically - Enable filtering 5. Use hybrid search - Better quality 6. Batch upserts - 100-200 vectors per batch 7. Monitor usage - Check Pinecone dashboard 8. Set up alerts - Usage/cost thresholds 9. Regular backups - Export important data 10. Test filters - Verify performance

Resources

Docs: https://docs.pinecone.io
Console: https://app.pinecone.io

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use pinecone when you want a fully managed, auto-scaling vector index with hybrid search instead of operating your own vector database cluster.

FAQ

What latency does the pinecone skill target?

The pinecone skill documents Pinecone's managed service for low-latency retrieval, citing under 100ms p95 latency for production RAG, recommendation, and semantic search workloads.

Which client library does pinecone require?

The pinecone skill lists pinecone-client as its dependency and covers serverless auto-scaling indexes with hybrid dense-plus-sparse search, metadata filtering, and namespaces.

Is Pinecone safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingllmagents

About

Pinecone by the numbers

Add your badge

How do you integrate Pinecone for production RAG?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Pinecone - Managed Vector Database

When to use Pinecone

Quick start

Installation

Basic usage

Core operations

Create index

Upsert vectors

Query vectors

Metadata filtering

Namespaces

Hybrid search (dense + sparse)

LangChain integration

LlamaIndex integration

Index management

Delete vectors

Best practices

Performance

Pricing (as of 2025)

Resources

Pinecone Deployment Guide

Serverless vs Pod-based

Serverless (Recommended)

Pod-based

Hybrid search

Dense + Sparse vectors

Namespaces for multi-tenancy

Metadata filtering

Exact match

Range queries

Complex filters

Best practices

Resources

Related skills

How it compares

FAQ

What latency does the pinecone skill target?

Which client library does pinecone require?

Is Pinecone safe to install?

This week in AI coding