Qdrant Vector Search

Name: Qdrant Vector Search
Author: davila7

davila7/claude-code-templates

860 installs
29.9k repo stars
Updated July 27, 2026
davila7/claude-code-templates

Qdrant Vector Search is a Claude Code skill that generates production-grade Qdrant distributed cluster and sharding configurations for developers who need semantic search and vector memory in AI agents.

About

Qdrant Vector Search is an advanced usage guide skill that helps developers stand up Qdrant for semantic search and agent vector memory. The skill covers distributed deployment with Raft consensus, docker-compose cluster topology, HTTP and gRPC port mapping, and sharding configuration patterns. The readme excerpt documents a 3-node Qdrant cluster with per-node storage volumes and cluster environment variables such as QDRANT__CLUSTER__ENABLED. Developers reach for Qdrant Vector Search when embedding pipelines are ready but production cluster YAML, sharding rules, and multi-node coordination still need correct defaults. The skill targets agent builders and backend engineers integrating Qdrant instead of guessing distributed vector-database settings.

Generates complete 3-node Raft-based Docker Compose cluster configurations
Produces Python client code for sharding, replication, and vector index setup
Supports both single-node and high-availability distributed deployments
Includes environment variable and volume mapping best practices
Delivers ready-to-run YAML and Python snippets for immediate use

Qdrant Vector Search by the numbers

860 all-time installs (skills.sh)
+22 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #450 of 4,386 Backend & APIs skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/davila7/claude-code-templates --skill qdrant-vector-search

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/davila7/claude-code-templates/qdrant-vector-search.svg)](https://skillselion.com/skills/davila7/claude-code-templates/qdrant-vector-search)

Installs	860
repo stars	★ 29.9k
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	davila7/claude-code-templates ↗

How do you configure Qdrant distributed clusters?

Quickly generate production-grade Qdrant distributed cluster and sharding configurations for semantic search and vector memory in AI agents.

Who is it for?

Backend and agent developers deploying Qdrant for semantic search who need distributed cluster and sharding configs beyond a single-node docker run.

Skip if: Developers who only need a local single-node Qdrant instance without clustering or production sharding requirements.

When should I use this skill?

A developer asks for Qdrant cluster setup, distributed vector search configuration, or sharding YAML for agent memory.

What you get

docker-compose.yml cluster configs, sharding YAML, and Qdrant node environment variable sets.

docker-compose cluster config
sharding configuration

By the numbers

Documents a 3-node Qdrant cluster in docker-compose
Maps HTTP port 6333 and gRPC port 6334 per node

Files

SKILL.mdMarkdownGitHub ↗

Qdrant - Vector Similarity Search Engine

High-performance vector database written in Rust for production RAG and semantic search.

When to use Qdrant

Use Qdrant when:

Building production RAG systems requiring low latency
Need hybrid search (vectors + metadata filtering)
Require horizontal scaling with sharding/replication
Want on-premise deployment with full data control
Need multi-vector storage per record (dense + sparse)
Building real-time recommendation systems

Key features:

Rust-powered: Memory-safe, high performance
Rich filtering: Filter by any payload field during search
Multiple vectors: Dense, sparse, multi-dense per point
Quantization: Scalar, product, binary for memory efficiency
Distributed: Raft consensus, sharding, replication
REST + gRPC: Both APIs with full feature parity

Use alternatives instead:

Chroma: Simpler setup, embedded use cases
FAISS: Maximum raw speed, research/batch processing
Pinecone: Fully managed, zero ops preferred
Weaviate: GraphQL preference, built-in vectorizers

Quick start

Installation

# Python client
pip install qdrant-client

# Docker (recommended for development)
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Docker with persistent storage
docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

Basic usage

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Connect to Qdrant
client = QdrantClient(host="localhost", port=6333)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

# Insert vectors with payload
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=[0.1, 0.2, ...],  # 384-dim vector
            payload={"title": "Doc 1", "category": "tech"}
        ),
        PointStruct(
            id=2,
            vector=[0.3, 0.4, ...],
            payload={"title": "Doc 2", "category": "science"}
        )
    ]
)

# Search with filtering
results = client.search(
    collection_name="documents",
    query_vector=[0.15, 0.25, ...],
    query_filter={
        "must": [{"key": "category", "match": {"value": "tech"}}]
    },
    limit=10
)

for point in results:
    print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")

Core concepts

Points - Basic data unit

from qdrant_client.models import PointStruct

# Point = ID + Vector(s) + Payload
point = PointStruct(
    id=123,                              # Integer or UUID string
    vector=[0.1, 0.2, 0.3, ...],        # Dense vector
    payload={                            # Arbitrary JSON metadata
        "title": "Document title",
        "category": "tech",
        "timestamp": 1699900000,
        "tags": ["python", "ml"]
    }
)

# Batch upsert (recommended)
client.upsert(
    collection_name="documents",
    points=[point1, point2, point3],
    wait=True  # Wait for indexing
)

Collections - Vector containers

from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

# Create with HNSW configuration
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=384,                        # Vector dimensions
        distance=Distance.COSINE         # COSINE, EUCLID, DOT, MANHATTAN
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                            # Connections per node (default 16)
        ef_construct=100,                # Build-time accuracy (default 100)
        full_scan_threshold=10000        # Switch to brute force below this
    ),
    on_disk_payload=True                 # Store payload on disk
)

# Collection info
info = client.get_collection("documents")
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")

Distance metrics

Metric	Use Case	Range
`COSINE`	Text embeddings, normalized vectors	0 to 2
`EUCLID`	Spatial data, image features	0 to ∞
`DOT`	Recommendations, unnormalized	-∞ to ∞
`MANHATTAN`	Sparse features, discrete data	0 to ∞

Search operations

Basic search

# Simple nearest neighbor search
results = client.search(
    collection_name="documents",
    query_vector=[0.1, 0.2, ...],
    limit=10,
    with_payload=True,
    with_vectors=False  # Don't return vectors (faster)
)

Filtered search

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Complex filtering
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="category", match=MatchValue(value="tech")),
            FieldCondition(key="timestamp", range=Range(gte=1699000000))
        ],
        must_not=[
            FieldCondition(key="status", match=MatchValue(value="archived"))
        ]
    ),
    limit=10
)

# Shorthand filter syntax
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter={
        "must": [
            {"key": "category", "match": {"value": "tech"}},
            {"key": "price", "range": {"gte": 10, "lte": 100}}
        ]
    },
    limit=10
)

Batch search

from qdrant_client.models import SearchRequest

# Multiple queries in one request
results = client.search_batch(
    collection_name="documents",
    requests=[
        SearchRequest(vector=[0.1, ...], limit=5),
        SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
        SearchRequest(vector=[0.3, ...], limit=10)
    ]
)

RAG integration

With sentence-transformers

from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

# Initialize
encoder = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient(host="localhost", port=6333)

# Create collection
client.create_collection(
    collection_name="knowledge_base",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

# Index documents
documents = [
    {"id": 1, "text": "Python is a programming language", "source": "wiki"},
    {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
]

points = [
    PointStruct(
        id=doc["id"],
        vector=encoder.encode(doc["text"]).tolist(),
        payload={"text": doc["text"], "source": doc["source"]}
    )
    for doc in documents
]
client.upsert(collection_name="knowledge_base", points=points)

# RAG retrieval
def retrieve(query: str, top_k: int = 5) -> list[dict]:
    query_vector = encoder.encode(query).tolist()
    results = client.search(
        collection_name="knowledge_base",
        query_vector=query_vector,
        limit=top_k
    )
    return [{"text": r.payload["text"], "score": r.score} for r in results]

# Use in RAG pipeline
context = retrieve("What is Python?")
prompt = f"Context: {context}\n\nQuestion: What is Python?"

With LangChain

from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

With LlamaIndex

from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()

Multi-vector support

Named vectors (different embedding models)

from qdrant_client.models import VectorParams, Distance

# Collection with multiple vector types
client.create_collection(
    collection_name="hybrid_search",
    vectors_config={
        "dense": VectorParams(size=384, distance=Distance.COSINE),
        "sparse": VectorParams(size=30000, distance=Distance.DOT)
    }
)

# Insert with named vectors
client.upsert(
    collection_name="hybrid_search",
    points=[
        PointStruct(
            id=1,
            vector={
                "dense": dense_embedding,
                "sparse": sparse_embedding
            },
            payload={"text": "document text"}
        )
    ]
)

# Search specific vector
results = client.search(
    collection_name="hybrid_search",
    query_vector=("dense", query_dense),  # Specify which vector
    limit=10
)

Sparse vectors (BM25, SPLADE)

from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector

# Collection with sparse vectors
client.create_collection(
    collection_name="sparse_search",
    vectors_config={},
    sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
)

# Insert sparse vector
client.upsert(
    collection_name="sparse_search",
    points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
)

Quantization (memory optimization)

from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType

# Scalar quantization (4x memory reduction)
client.create_collection(
    collection_name="quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,
            quantile=0.99,        # Clip outliers
            always_ram=True      # Keep quantized in RAM
        )
    )
)

# Search with rescoring
results = client.search(
    collection_name="quantized",
    query_vector=query,
    search_params={"quantization": {"rescore": True}},  # Rescore top results
    limit=10
)

Payload indexing

from qdrant_client.models import PayloadSchemaType

# Create payload index for faster filtering
client.create_payload_index(
    collection_name="documents",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

client.create_payload_index(
    collection_name="documents",
    field_name="timestamp",
    field_schema=PayloadSchemaType.INTEGER
)

# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

Production deployment

Qdrant Cloud

from qdrant_client import QdrantClient

# Connect to Qdrant Cloud
client = QdrantClient(
    url="https://your-cluster.cloud.qdrant.io",
    api_key="your-api-key"
)

Performance tuning

# Optimize for search speed (higher recall)
client.update_collection(
    collection_name="documents",
    hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
)

# Optimize for indexing speed (bulk loads)
client.update_collection(
    collection_name="documents",
    optimizer_config={"indexing_threshold": 20000}
)

Best practices

1. Batch operations - Use batch upsert/search for efficiency 2. Payload indexing - Index fields used in filters 3. Quantization - Enable for large collections (>1M vectors) 4. Sharding - Use for collections >10M vectors 5. On-disk storage - Enable on_disk_payload for large payloads 6. Connection pooling - Reuse client instances

Common issues

Slow search with filters:

# Create payload index for filtered fields
client.create_payload_index(
    collection_name="docs",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD
)

Out of memory:

# Enable quantization and on-disk storage
client.create_collection(
    collection_name="large_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(...),
    on_disk_payload=True
)

Connection issues:

# Use timeout and retry
client = QdrantClient(
    host="localhost",
    port=6333,
    timeout=30,
    prefer_grpc=True  # gRPC for better performance
)

References

[Advanced Usage](references/advanced-usage.md) - Distributed mode, hybrid search, recommendations
[Troubleshooting](references/troubleshooting.md) - Common issues, debugging, performance tuning

Resources

GitHub: https://github.com/qdrant/qdrant (22k+ stars)
Docs: https://qdrant.tech/documentation/
Python Client: https://github.com/qdrant/qdrant-client
Cloud: https://cloud.qdrant.io
Version: 1.12.0+
License: Apache 2.0

Qdrant Advanced Usage Guide

Distributed Deployment

Cluster Setup

Qdrant uses Raft consensus for distributed coordination.

# docker-compose.yml for 3-node cluster
version: '3.8'
services:
  qdrant-node-1:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
      - "6335:6335"
    volumes:
      - ./node1_storage:/qdrant/storage
    environment:
      - QDRANT__CLUSTER__ENABLED=true
      - QDRANT__CLUSTER__P2P__PORT=6335
      - QDRANT__SERVICE__HTTP_PORT=6333
      - QDRANT__SERVICE__GRPC_PORT=6334

  qdrant-node-2:
    image: qdrant/qdrant:latest
    ports:
      - "6343:6333"
      - "6344:6334"
      - "6345:6335"
    volumes:
      - ./node2_storage:/qdrant/storage
    environment:
      - QDRANT__CLUSTER__ENABLED=true
      - QDRANT__CLUSTER__P2P__PORT=6335
      - QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
    depends_on:
      - qdrant-node-1

  qdrant-node-3:
    image: qdrant/qdrant:latest
    ports:
      - "6353:6333"
      - "6354:6334"
      - "6355:6335"
    volumes:
      - ./node3_storage:/qdrant/storage
    environment:
      - QDRANT__CLUSTER__ENABLED=true
      - QDRANT__CLUSTER__P2P__PORT=6335
      - QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
    depends_on:
      - qdrant-node-1

Sharding Configuration

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, ShardingMethod

client = QdrantClient(host="localhost", port=6333)

# Create sharded collection
client.create_collection(
    collection_name="large_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    shard_number=6,  # Number of shards
    replication_factor=2,  # Replicas per shard
    write_consistency_factor=1  # Required acks for write
)

# Check cluster status
cluster_info = client.get_cluster_info()
print(f"Peers: {cluster_info.peers}")
print(f"Raft state: {cluster_info.raft_info}")

Replication and Consistency

from qdrant_client.models import WriteOrdering

# Strong consistency write
client.upsert(
    collection_name="critical_data",
    points=points,
    ordering=WriteOrdering.STRONG  # Wait for all replicas
)

# Eventual consistency (faster)
client.upsert(
    collection_name="logs",
    points=points,
    ordering=WriteOrdering.WEAK  # Return after primary ack
)

# Read from specific shard
results = client.search(
    collection_name="documents",
    query_vector=query,
    consistency="majority"  # Read from majority of replicas
)

Hybrid Search

Dense + Sparse Vectors

Combine semantic (dense) and keyword (sparse) search:

from qdrant_client.models import (
    VectorParams, SparseVectorParams, SparseIndexParams,
    Distance, PointStruct, SparseVector, Prefetch, Query
)

# Create hybrid collection
client.create_collection(
    collection_name="hybrid",
    vectors_config={
        "dense": VectorParams(size=384, distance=Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams(
            index=SparseIndexParams(on_disk=False)
        )
    }
)

# Insert with both vector types
def encode_sparse(text: str) -> SparseVector:
    """Simple BM25-like sparse encoding"""
    from collections import Counter
    tokens = text.lower().split()
    counts = Counter(tokens)
    # Map tokens to indices (use vocabulary in production)
    indices = [hash(t) % 30000 for t in counts.keys()]
    values = list(counts.values())
    return SparseVector(indices=indices, values=values)

client.upsert(
    collection_name="hybrid",
    points=[
        PointStruct(
            id=1,
            vector={
                "dense": dense_encoder.encode("Python programming").tolist(),
                "sparse": encode_sparse("Python programming language code")
            },
            payload={"text": "Python programming language code"}
        )
    ]
)

# Hybrid search with Reciprocal Rank Fusion (RRF)
from qdrant_client.models import FusionQuery

results = client.query_points(
    collection_name="hybrid",
    prefetch=[
        Prefetch(query=dense_query, using="dense", limit=20),
        Prefetch(query=sparse_query, using="sparse", limit=20)
    ],
    query=FusionQuery(fusion="rrf"),  # Combine results
    limit=10
)

Multi-Stage Search

from qdrant_client.models import Prefetch, Query

# Two-stage retrieval: coarse then fine
results = client.query_points(
    collection_name="documents",
    prefetch=[
        Prefetch(
            query=query_vector,
            limit=100,  # Broad first stage
            params={"quantization": {"rescore": False}}  # Fast, approximate
        )
    ],
    query=Query(nearest=query_vector),
    limit=10,
    params={"quantization": {"rescore": True}}  # Accurate reranking
)

Recommendations

Item-to-Item Recommendations

# Find similar items
recommendations = client.recommend(
    collection_name="products",
    positive=[1, 2, 3],  # IDs user liked
    negative=[4],         # IDs user disliked
    limit=10
)

# With filtering
recommendations = client.recommend(
    collection_name="products",
    positive=[1, 2],
    query_filter={
        "must": [
            {"key": "category", "match": {"value": "electronics"}},
            {"key": "in_stock", "match": {"value": True}}
        ]
    },
    limit=10
)

Lookup from Another Collection

from qdrant_client.models import RecommendStrategy, LookupLocation

# Recommend using vectors from another collection
results = client.recommend(
    collection_name="products",
    positive=[
        LookupLocation(
            collection_name="user_history",
            id="user_123"
        )
    ],
    strategy=RecommendStrategy.AVERAGE_VECTOR,
    limit=10
)

Advanced Filtering

Nested Payload Filtering

from qdrant_client.models import Filter, FieldCondition, MatchValue, NestedCondition

# Filter on nested objects
results = client.search(
    collection_name="documents",
    query_vector=query,
    query_filter=Filter(
        must=[
            NestedCondition(
                key="metadata",
                filter=Filter(
                    must=[
                        FieldCondition(
                            key="author.name",
                            match=MatchValue(value="John")
                        )
                    ]
                )
            )
        ]
    ),
    limit=10
)

Geo Filtering

from qdrant_client.models import FieldCondition, GeoRadius, GeoPoint

# Find within radius
results = client.search(
    collection_name="locations",
    query_vector=query,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="location",
                geo_radius=GeoRadius(
                    center=GeoPoint(lat=40.7128, lon=-74.0060),
                    radius=5000  # meters
                )
            )
        ]
    ),
    limit=10
)

# Geo bounding box
from qdrant_client.models import GeoBoundingBox

results = client.search(
    collection_name="locations",
    query_vector=query,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="location",
                geo_bounding_box=GeoBoundingBox(
                    top_left=GeoPoint(lat=40.8, lon=-74.1),
                    bottom_right=GeoPoint(lat=40.6, lon=-73.9)
                )
            )
        ]
    ),
    limit=10
)

Full-Text Search

from qdrant_client.models import TextIndexParams, TokenizerType

# Create text index
client.create_payload_index(
    collection_name="documents",
    field_name="content",
    field_schema=TextIndexParams(
        type="text",
        tokenizer=TokenizerType.WORD,
        min_token_len=2,
        max_token_len=15,
        lowercase=True
    )
)

# Full-text filter
from qdrant_client.models import MatchText

results = client.search(
    collection_name="documents",
    query_vector=query,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="content",
                match=MatchText(text="machine learning")
            )
        ]
    ),
    limit=10
)

Quantization Strategies

Scalar Quantization (INT8)

from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType

# ~4x memory reduction, minimal accuracy loss
client.create_collection(
    collection_name="scalar_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,
            quantile=0.99,       # Clip extreme values
            always_ram=True     # Keep quantized vectors in RAM
        )
    )
)

Product Quantization

from qdrant_client.models import ProductQuantization, ProductQuantizationConfig, CompressionRatio

# ~16x memory reduction, some accuracy loss
client.create_collection(
    collection_name="product_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=ProductQuantization(
        product=ProductQuantizationConfig(
            compression=CompressionRatio.X16,
            always_ram=True
        )
    )
)

Binary Quantization

from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig

# ~32x memory reduction, requires oversampling
client.create_collection(
    collection_name="binary_quantized",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    quantization_config=BinaryQuantization(
        binary=BinaryQuantizationConfig(always_ram=True)
    )
)

# Search with oversampling
results = client.search(
    collection_name="binary_quantized",
    query_vector=query,
    search_params={
        "quantization": {
            "rescore": True,
            "oversampling": 2.0  # Retrieve 2x candidates, rescore
        }
    },
    limit=10
)

Snapshots and Backups

Create Snapshot

# Create collection snapshot
snapshot_info = client.create_snapshot(collection_name="documents")
print(f"Snapshot: {snapshot_info.name}")

# List snapshots
snapshots = client.list_snapshots(collection_name="documents")
for s in snapshots:
    print(f"{s.name}: {s.size} bytes")

# Full storage snapshot
full_snapshot = client.create_full_snapshot()

Restore from Snapshot

# Download snapshot
client.download_snapshot(
    collection_name="documents",
    snapshot_name="documents-2024-01-01.snapshot",
    target_path="./backup/"
)

# Restore (via REST API)
import requests

response = requests.put(
    "http://localhost:6333/collections/documents/snapshots/recover",
    json={"location": "file:///backup/documents-2024-01-01.snapshot"}
)

Collection Aliases

# Create alias
client.update_collection_aliases(
    change_aliases_operations=[
        {"create_alias": {"alias_name": "production", "collection_name": "documents_v2"}}
    ]
)

# Blue-green deployment
# 1. Create new collection with updates
client.create_collection(collection_name="documents_v3", ...)

# 2. Populate new collection
client.upsert(collection_name="documents_v3", points=new_points)

# 3. Atomic switch
client.update_collection_aliases(
    change_aliases_operations=[
        {"delete_alias": {"alias_name": "production"}},
        {"create_alias": {"alias_name": "production", "collection_name": "documents_v3"}}
    ]
)

# Search via alias
results = client.search(collection_name="production", query_vector=query, limit=10)

Scroll and Iteration

Scroll Through All Points

# Paginated iteration
offset = None
all_points = []

while True:
    results, offset = client.scroll(
        collection_name="documents",
        limit=100,
        offset=offset,
        with_payload=True,
        with_vectors=False
    )
    all_points.extend(results)

    if offset is None:
        break

print(f"Total points: {len(all_points)}")

Filtered Scroll

# Scroll with filter
results, _ = client.scroll(
    collection_name="documents",
    scroll_filter=Filter(
        must=[
            FieldCondition(key="status", match=MatchValue(value="active"))
        ]
    ),
    limit=1000
)

Async Client

import asyncio
from qdrant_client import AsyncQdrantClient

async def main():
    client = AsyncQdrantClient(host="localhost", port=6333)

    # Async operations
    await client.create_collection(
        collection_name="async_docs",
        vectors_config=VectorParams(size=384, distance=Distance.COSINE)
    )

    await client.upsert(
        collection_name="async_docs",
        points=points
    )

    results = await client.search(
        collection_name="async_docs",
        query_vector=query,
        limit=10
    )

    return results

results = asyncio.run(main())

gRPC Client

from qdrant_client import QdrantClient

# Prefer gRPC for better performance
client = QdrantClient(
    host="localhost",
    port=6333,
    grpc_port=6334,
    prefer_grpc=True  # Use gRPC when available
)

# gRPC-only client
from qdrant_client import QdrantClient

client = QdrantClient(
    host="localhost",
    grpc_port=6334,
    prefer_grpc=True,
    https=False
)

Multitenancy

Payload-Based Isolation

# Single collection, filter by tenant
client.upsert(
    collection_name="multi_tenant",
    points=[
        PointStruct(
            id=1,
            vector=embedding,
            payload={"tenant_id": "tenant_a", "text": "..."}
        )
    ]
)

# Search within tenant
results = client.search(
    collection_name="multi_tenant",
    query_vector=query,
    query_filter=Filter(
        must=[FieldCondition(key="tenant_id", match=MatchValue(value="tenant_a"))]
    ),
    limit=10
)

Collection-Per-Tenant

# Create tenant collection
def create_tenant_collection(tenant_id: str):
    client.create_collection(
        collection_name=f"tenant_{tenant_id}",
        vectors_config=VectorParams(size=384, distance=Distance.COSINE)
    )

# Search tenant collection
def search_tenant(tenant_id: str, query_vector: list, limit: int = 10):
    return client.search(
        collection_name=f"tenant_{tenant_id}",
        query_vector=query_vector,
        limit=limit
    )

Performance Monitoring

Collection Statistics

# Collection info
info = client.get_collection("documents")
print(f"Points: {info.points_count}")
print(f"Indexed vectors: {info.indexed_vectors_count}")
print(f"Segments: {len(info.segments)}")
print(f"Status: {info.status}")

# Detailed segment info
for i, segment in enumerate(info.segments):
    print(f"Segment {i}: {segment}")

Telemetry

# Get telemetry data
telemetry = client.get_telemetry()
print(f"Collections: {telemetry.collections}")
print(f"Operations: {telemetry.operations}")

Qdrant Troubleshooting Guide

Installation Issues

Docker Issues

Error: Cannot connect to Docker daemon

Fix:

# Start Docker daemon
sudo systemctl start docker

# Or use Docker Desktop on Mac/Windows
open -a Docker

Error: Port 6333 already in use

Fix:

# Find process using port
lsof -i :6333

# Kill process or use different port
docker run -p 6334:6333 qdrant/qdrant

Python Client Issues

Error: ModuleNotFoundError: No module named 'qdrant_client'

Fix:

pip install qdrant-client

# With specific version
pip install qdrant-client>=1.12.0

Error: grpc._channel._InactiveRpcError

Fix:

# Install with gRPC support
pip install 'qdrant-client[grpc]'

# Or disable gRPC
client = QdrantClient(host="localhost", port=6333, prefer_grpc=False)

Connection Issues

Cannot Connect to Server

Error: ConnectionRefusedError: [Errno 111] Connection refused

Solutions:

1. Check server is running:

docker ps | grep qdrant
curl http://localhost:6333/healthz

2. Verify port binding:

# Check listening ports
netstat -tlnp | grep 6333

# Docker port mapping
docker port <container_id>

3. Use correct host:

# Docker on Linux
client = QdrantClient(host="localhost", port=6333)

# Docker on Mac/Windows with networking issues
client = QdrantClient(host="127.0.0.1", port=6333)

# Inside Docker network
client = QdrantClient(host="qdrant", port=6333)

Timeout Errors

Error: TimeoutError: Connection timed out

Fix:

# Increase timeout
client = QdrantClient(
    host="localhost",
    port=6333,
    timeout=60  # seconds
)

# For large operations
client.upsert(
    collection_name="documents",
    points=large_batch,
    wait=False  # Don't wait for indexing
)

SSL/TLS Errors

Error: ssl.SSLCertVerificationError

Fix:

# Qdrant Cloud
client = QdrantClient(
    url="https://cluster.cloud.qdrant.io",
    api_key="your-api-key"
)

# Self-signed certificate
client = QdrantClient(
    host="localhost",
    port=6333,
    https=True,
    verify=False  # Disable verification (not recommended for production)
)

Collection Issues

Collection Already Exists

Error: ValueError: Collection 'documents' already exists

Fix:

# Check before creating
collections = client.get_collections().collections
names = [c.name for c in collections]

if "documents" not in names:
    client.create_collection(...)

# Or recreate
client.recreate_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

Collection Not Found

Error: NotFoundException: Collection 'docs' not found

Fix:

# List available collections
collections = client.get_collections()
print([c.name for c in collections.collections])

# Check exact name (case-sensitive)
try:
    info = client.get_collection("documents")
except Exception as e:
    print(f"Collection not found: {e}")

Vector Dimension Mismatch

Error: ValueError: Vector dimension mismatch. Expected 384, got 768

Fix:

# Check collection config
info = client.get_collection("documents")
print(f"Expected dimension: {info.config.params.vectors.size}")

# Recreate with correct dimension
client.recreate_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)  # Match your embeddings
)

Search Issues

Empty Search Results

Problem: Search returns empty results.

Solutions:

1. Verify data exists:

info = client.get_collection("documents")
print(f"Points: {info.points_count}")

# Scroll to check data
points, _ = client.scroll(
    collection_name="documents",
    limit=10,
    with_payload=True
)
print(points)

2. Check vector format:

# Must be list of floats
query_vector = embedding.tolist()  # Convert numpy to list

# Check dimensions
print(f"Query dimension: {len(query_vector)}")

3. Verify filter conditions:

# Test without filter first
results = client.search(
    collection_name="documents",
    query_vector=query,
    limit=10
    # No filter
)

# Then add filter incrementally

Slow Search Performance

Problem: Search takes too long.

Solutions:

1. Create payload indexes:

# Index fields used in filters
client.create_payload_index(
    collection_name="documents",
    field_name="category",
    field_schema="keyword"
)

2. Enable quantization:

client.update_collection(
    collection_name="documents",
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(type=ScalarType.INT8)
    )
)

3. Tune HNSW parameters:

# Faster search (less accurate)
client.update_collection(
    collection_name="documents",
    hnsw_config=HnswConfigDiff(ef_construct=64, m=8)
)

# Use ef search parameter
results = client.search(
    collection_name="documents",
    query_vector=query,
    search_params={"hnsw_ef": 64},  # Lower = faster
    limit=10
)

4. Use gRPC:

client = QdrantClient(
    host="localhost",
    port=6333,
    grpc_port=6334,
    prefer_grpc=True
)

Inconsistent Results

Problem: Same query returns different results.

Solutions:

1. Wait for indexing:

client.upsert(
    collection_name="documents",
    points=points,
    wait=True  # Wait for index update
)

2. Check replication consistency:

# Strong consistency read
results = client.search(
    collection_name="documents",
    query_vector=query,
    consistency="all"  # Read from all replicas
)

Upsert Issues

Batch Upsert Fails

Error: PayloadError: Payload too large

Fix:

# Split into smaller batches
def batch_upsert(client, collection, points, batch_size=100):
    for i in range(0, len(points), batch_size):
        batch = points[i:i + batch_size]
        client.upsert(
            collection_name=collection,
            points=batch,
            wait=True
        )

batch_upsert(client, "documents", large_points_list)

Invalid Point ID

Error: ValueError: Invalid point ID

Fix:

# Valid ID types: int or UUID string
from uuid import uuid4

# Integer ID
PointStruct(id=123, vector=vec, payload={})

# UUID string
PointStruct(id=str(uuid4()), vector=vec, payload={})

# NOT valid
PointStruct(id="custom-string-123", ...)  # Use UUID format

Payload Validation Errors

Error: ValidationError: Invalid payload

Fix:

# Ensure JSON-serializable payload
import json

payload = {
    "title": "Document",
    "count": 42,
    "tags": ["a", "b"],
    "nested": {"key": "value"}
}

# Validate before upsert
json.dumps(payload)  # Should not raise

# Avoid non-serializable types
# NOT valid: datetime, numpy arrays, custom objects
payload = {
    "timestamp": datetime.now().isoformat(),  # Convert to string
    "vector": embedding.tolist()  # Convert numpy to list
}

Memory Issues

Out of Memory

Error: MemoryError or container killed

Solutions:

1. Enable on-disk storage:

client.create_collection(
    collection_name="large_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    on_disk_payload=True,  # Store payloads on disk
    hnsw_config=HnswConfigDiff(on_disk=True)  # Store HNSW on disk
)

2. Use quantization:

# 4x memory reduction
client.update_collection(
    collection_name="large_collection",
    quantization_config=ScalarQuantization(
        scalar=ScalarQuantizationConfig(
            type=ScalarType.INT8,
            always_ram=False  # Keep on disk
        )
    )
)

3. Increase Docker memory:

docker run -m 8g -p 6333:6333 qdrant/qdrant

4. Configure Qdrant storage:

# config.yaml
storage:
  performance:
    max_search_threads: 2
  optimizers:
    memmap_threshold_kb: 20000

High Memory Usage During Indexing

Fix:

# Increase indexing threshold for bulk loads
client.update_collection(
    collection_name="documents",
    optimizer_config={
        "indexing_threshold": 50000  # Delay indexing
    }
)

# Bulk insert
client.upsert(collection_name="documents", points=all_points, wait=False)

# Then optimize
client.update_collection(
    collection_name="documents",
    optimizer_config={
        "indexing_threshold": 10000  # Resume normal indexing
    }
)

Cluster Issues

Node Not Joining Cluster

Problem: New node fails to join cluster.

Fix:

# Check network connectivity
docker exec qdrant-node-2 ping qdrant-node-1

# Verify bootstrap URL
docker logs qdrant-node-2 | grep bootstrap

# Check Raft state
curl http://localhost:6333/cluster

Split Brain

Problem: Cluster has inconsistent state.

Fix:

# Force leader election
curl -X POST http://localhost:6333/cluster/recover

# Or restart minority nodes
docker restart qdrant-node-2 qdrant-node-3

Replication Lag

Problem: Replicas fall behind.

Fix:

# Check collection status
info = client.get_collection("documents")
print(f"Status: {info.status}")

# Use strong consistency for critical writes
client.upsert(
    collection_name="documents",
    points=points,
    ordering=WriteOrdering.STRONG
)

Performance Tuning

Benchmark Configuration

import time
import numpy as np

def benchmark_search(client, collection, n_queries=100, dimension=384):
    # Generate random queries
    queries = [np.random.rand(dimension).tolist() for _ in range(n_queries)]

    # Warmup
    for q in queries[:10]:
        client.search(collection_name=collection, query_vector=q, limit=10)

    # Benchmark
    start = time.perf_counter()
    for q in queries:
        client.search(collection_name=collection, query_vector=q, limit=10)
    elapsed = time.perf_counter() - start

    print(f"QPS: {n_queries / elapsed:.2f}")
    print(f"Latency: {elapsed / n_queries * 1000:.2f}ms")

benchmark_search(client, "documents")

Optimal HNSW Parameters

# High recall (slower)
client.create_collection(
    collection_name="high_recall",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    hnsw_config=HnswConfigDiff(
        m=32,              # More connections
        ef_construct=200   # Higher build quality
    )
)

# High speed (lower recall)
client.create_collection(
    collection_name="high_speed",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    hnsw_config=HnswConfigDiff(
        m=8,               # Fewer connections
        ef_construct=64    # Lower build quality
    )
)

# Balanced
client.create_collection(
    collection_name="balanced",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    hnsw_config=HnswConfigDiff(
        m=16,              # Default
        ef_construct=100   # Default
    )
)

Debugging Tips

Enable Verbose Logging

import logging

logging.basicConfig(level=logging.DEBUG)
logging.getLogger("qdrant_client").setLevel(logging.DEBUG)

Check Server Logs

# Docker logs
docker logs -f qdrant

# With timestamps
docker logs --timestamps qdrant

# Last 100 lines
docker logs --tail 100 qdrant

Inspect Collection State

# Collection info
info = client.get_collection("documents")
print(f"Status: {info.status}")
print(f"Points: {info.points_count}")
print(f"Segments: {len(info.segments)}")
print(f"Config: {info.config}")

# Sample points
points, _ = client.scroll(
    collection_name="documents",
    limit=5,
    with_payload=True,
    with_vectors=True
)
for p in points:
    print(f"ID: {p.id}, Payload: {p.payload}")

Test Connection

def test_connection(host="localhost", port=6333):
    try:
        client = QdrantClient(host=host, port=port, timeout=5)
        collections = client.get_collections()
        print(f"Connected! Collections: {len(collections.collections)}")
        return True
    except Exception as e:
        print(f"Connection failed: {e}")
        return False

test_connection()

Getting Help

1. Documentation: https://qdrant.tech/documentation/ 2. GitHub Issues: https://github.com/qdrant/qdrant/issues 3. Discord: https://discord.gg/qdrant 4. Stack Overflow: Tag qdrant

Reporting Issues

Include:

Qdrant version: curl http://localhost:6333/
Python client version: pip show qdrant-client
Full error traceback
Minimal reproducible code
Collection configuration

Related skills

Lark Openapi ExplorerInstantly explore, test, and generate calls against the full Lark (Feishu) OpenAPI surface without leaving their agent workflow.471k

Lark EventConsume real-time events from Lark/Feishu as structured NDJSON streams inside AI agent workflows.382k15.8k

Lark Openapi ExplorerWhen an existing Lark/Feishu skill or CLI command cannot fulfill a specific requirement and they need to discover and invoke the exact native OpenAPI endpoint.381k15.8k

Just ScrapeQuickly search, crawl, extract structured JSON, or monitor web pages without writing custom scraping code.245k37

Lark AppsQuery the current visibility and permission scope of a Lark (Feishu) app without writing HTTP client code.230k15.8k

SupabaseGet accurate, up-to-date Supabase implementation guidance across database, auth, realtime, storage, edge functions and vector search without relying on outd182k2.4k

How it compares

Pick Qdrant Vector Search when you already chose Qdrant and need distributed cluster YAML rather than general embedding pipeline guidance.

FAQ

What cluster size does Qdrant Vector Search document?

Qdrant Vector Search documents a 3-node Qdrant cluster in docker-compose with Raft consensus, per-node storage volumes, and separate HTTP, gRPC, and P2P ports for distributed coordination.

What problem does Qdrant Vector Search solve for agents?

Qdrant Vector Search solves production deployment of Qdrant for semantic search and vector memory by generating cluster and sharding configurations instead of trial-and-error single-node setups.

Is Qdrant Vector Search safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Backend & APIsagentsautomation