Chroma

Name: Chroma
Author: orchestra-research

orchestra-research/ai-research-skills

432 installs
11.2k repo stars
Updated June 16, 2026
orchestra-research/ai-research-skills

chroma is a coding-agent skill that wires Chroma vector storage and similarity search into LangChain or LlamaIndex RAG pipelines for developers building document retrieval backends.

About

chroma is an orchestra-research/ai-research-skills guide for integrating ChromaDB into Python RAG stacks. The skill walks through LangChain's Chroma vectorstore with OpenAI embeddings, persist_directory setup, similarity_search queries, and retriever configuration, plus LlamaIndex's ChromaVectorStore with chromadb PersistentClient collections. Developers use chroma when they need a local or persisted vector index inside LangChain or LlamaIndex rather than hand-rolling chromadb client code. The skill covers from_documents ingestion, k-neighbor retrieval, and retriever wiring so agents can stand up semantic search quickly during backend or agent-knowledge work.

LangChain Chroma.from_documents with persist_directory and similarity_search
LlamaIndex ChromaVectorStore with PersistentClient collections
Vector plus full-text search with metadata filtering for RAG
Simple four-function API scaling from notebooks to clusters
Open-source self-hosted path via chromadb and sentence-transformers

Chroma by the numbers

432 all-time installs (skills.sh)
+31 installs in the week ending Jul 26, 2026 (Skillselion tracking)
Ranked #1,882 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/orchestra-research/ai-research-skills --skill chroma

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/orchestra-research/ai-research-skills/chroma.svg)](https://skillselion.com/skills/orchestra-research/ai-research-skills/chroma)

Installs	432
repo stars	★ 11.2k
Security audit	3 / 3 scanners passed
Last updated	June 16, 2026
Repository	orchestra-research/ai-research-skills ↗

How do you integrate Chroma vector search with LangChain or LlamaIndex?

Wire Chroma vector storage and similarity search into LangChain or LlamaIndex RAG pipelines from your coding agent.

Who is it for?

Python developers adding Chroma-backed semantic retrieval to LangChain or LlamaIndex agent or API projects.

Skip if: Teams standardizing on Qdrant, Pinecone, or pgvector who do not plan to run Chroma as their vector backend.

When should I use this skill?

A RAG pipeline needs Chroma persistence, embedding ingestion, or retriever setup in LangChain or LlamaIndex.

What you get

Persisted Chroma collections, configured retrievers, and working similarity_search results in a RAG pipeline.

Chroma vector store
Configured retriever
Similarity search queries

Files

SKILL.mdMarkdownGitHub ↗

Chroma - Open-Source Embedding Database

The AI-native database for building LLM applications with memory.

When to use Chroma

Use Chroma when:

Building RAG (retrieval-augmented generation) applications
Need local/self-hosted vector database
Want open-source solution (Apache 2.0)
Prototyping in notebooks
Semantic search over documents
Storing embeddings with metadata

Metrics:

24,300+ GitHub stars
1,900+ forks
v1.3.3 (stable, weekly releases)
Apache 2.0 license

Use alternatives instead:

Pinecone: Managed cloud, auto-scaling
FAISS: Pure similarity search, no metadata
Weaviate: Production ML-native database
Qdrant: High performance, Rust-based

Quick start

Installation

# Python
pip install chromadb

# JavaScript/TypeScript
npm install chromadb @chroma-core/default-embed

Basic usage (Python)

import chromadb

# Create client
client = chromadb.Client()

# Create collection
collection = client.create_collection(name="my_collection")

# Add documents
collection.add(
    documents=["This is document 1", "This is document 2"],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}],
    ids=["id1", "id2"]
)

# Query
results = collection.query(
    query_texts=["document about topic"],
    n_results=2
)

print(results)

Core operations

1. Create collection

# Simple collection
collection = client.create_collection("my_docs")

# With custom embedding function
from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="my_docs",
    embedding_function=openai_ef
)

# Get existing collection
collection = client.get_collection("my_docs")

# Delete collection
client.delete_collection("my_docs")

2. Add documents

# Add with auto-generated IDs
collection.add(
    documents=["Doc 1", "Doc 2", "Doc 3"],
    metadatas=[
        {"source": "web", "category": "tutorial"},
        {"source": "pdf", "page": 5},
        {"source": "api", "timestamp": "2025-01-01"}
    ],
    ids=["id1", "id2", "id3"]
)

# Add with custom embeddings
collection.add(
    embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
    documents=["Doc 1", "Doc 2"],
    ids=["id1", "id2"]
)

3. Query (similarity search)

# Basic query
results = collection.query(
    query_texts=["machine learning tutorial"],
    n_results=5
)

# Query with filters
results = collection.query(
    query_texts=["Python programming"],
    n_results=3,
    where={"source": "web"}
)

# Query with metadata filters
results = collection.query(
    query_texts=["advanced topics"],
    where={
        "$and": [
            {"category": "tutorial"},
            {"difficulty": {"$gte": 3}}
        ]
    }
)

# Access results
print(results["documents"])      # List of matching documents
print(results["metadatas"])      # Metadata for each doc
print(results["distances"])      # Similarity scores
print(results["ids"])            # Document IDs

4. Get documents

# Get by IDs
docs = collection.get(
    ids=["id1", "id2"]
)

# Get with filters
docs = collection.get(
    where={"category": "tutorial"},
    limit=10
)

# Get all documents
docs = collection.get()

5. Update documents

# Update document content
collection.update(
    ids=["id1"],
    documents=["Updated content"],
    metadatas=[{"source": "updated"}]
)

6. Delete documents

# Delete by IDs
collection.delete(ids=["id1", "id2"])

# Delete with filter
collection.delete(
    where={"source": "outdated"}
)

Persistent storage

# Persist to disk
client = chromadb.PersistentClient(path="./chroma_db")

collection = client.create_collection("my_docs")
collection.add(documents=["Doc 1"], ids=["id1"])

# Data persisted automatically
# Reload later with same path
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("my_docs")

Embedding functions

Default (Sentence Transformers)

# Uses sentence-transformers by default
collection = client.create_collection("my_docs")
# Default model: all-MiniLM-L6-v2

OpenAI

from chromadb.utils import embedding_functions

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-3-small"
)

collection = client.create_collection(
    name="openai_docs",
    embedding_function=openai_ef
)

HuggingFace

huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
    api_key="your-key",
    model_name="sentence-transformers/all-mpnet-base-v2"
)

collection = client.create_collection(
    name="hf_docs",
    embedding_function=huggingface_ef
)

Custom embedding function

from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        # Your embedding logic
        return embeddings

my_ef = MyEmbeddingFunction()
collection = client.create_collection(
    name="custom_docs",
    embedding_function=my_ef
)

Metadata filtering

# Exact match
results = collection.query(
    query_texts=["query"],
    where={"category": "tutorial"}
)

# Comparison operators
results = collection.query(
    query_texts=["query"],
    where={"page": {"$gt": 10}}  # $gt, $gte, $lt, $lte, $ne
)

# Logical operators
results = collection.query(
    query_texts=["query"],
    where={
        "$and": [
            {"category": "tutorial"},
            {"difficulty": {"$lte": 3}}
        ]
    }  # Also: $or
)

# Contains
results = collection.query(
    query_texts=["query"],
    where={"tags": {"$in": ["python", "ml"]}}
)

LangChain integration

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(documents)

# Create Chroma vector store
vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

# Query
results = vectorstore.similarity_search("machine learning", k=3)

# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

LlamaIndex integration

from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
import chromadb

# Initialize Chroma
db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("my_collection")

# Create vector store
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Create index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is machine learning?")

Server mode

# Run Chroma server
# Terminal: chroma run --path ./chroma_db --port 8000

# Connect to server
import chromadb
from chromadb.config import Settings

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    settings=Settings(anonymized_telemetry=False)
)

# Use as normal
collection = client.get_or_create_collection("my_docs")

Best practices

1. Use persistent client - Don't lose data on restart 2. Add metadata - Enables filtering and tracking 3. Batch operations - Add multiple docs at once 4. Choose right embedding model - Balance speed/quality 5. Use filters - Narrow search space 6. Unique IDs - Avoid collisions 7. Regular backups - Copy chroma_db directory 8. Monitor collection size - Scale up if needed 9. Test embedding functions - Ensure quality 10. Use server mode for production - Better for multi-user

Performance

Operation	Latency	Notes
Add 100 docs	~1-3s	With embedding
Query (top 10)	~50-200ms	Depends on collection size
Metadata filter	~10-50ms	Fast with proper indexing

Resources

GitHub: https://github.com/chroma-core/chroma ⭐ 24,300+
Docs: https://docs.trychroma.com
Discord: https://discord.gg/MMeYNTmh3x
Version: 1.3.3+
License: Apache 2.0

Chroma Integration Guide

Integration with LangChain, LlamaIndex, and frameworks.

LangChain

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(),
    persist_directory="./chroma_db"
)

# Query
results = vectorstore.similarity_search("query", k=3)

# As retriever
retriever = vectorstore.as_retriever()

LlamaIndex

from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("docs")

vector_store = ChromaVectorStore(chroma_collection=collection)

Resources

Docs: https://docs.trychroma.com

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Choose chroma for lightweight local ChromaDB RAG wiring; pick qdrant-vector-search when you need distributed clusters, sharding, or production-scale Qdrant ops.

FAQ

Which frameworks does the chroma skill support?

The chroma skill supports LangChain via langchain_chroma.Chroma with from_documents, similarity_search, and as_retriever, and LlamaIndex via ChromaVectorStore backed by chromadb PersistentClient collections.

How does chroma handle persistent vector storage?

The chroma skill configures persist_directory for LangChain Chroma stores and chromadb PersistentClient paths for LlamaIndex collections, keeping embeddings on disk between ingestion and query sessions.

Is Chroma safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingllmautomation

About

Chroma by the numbers

Add your badge

How do you integrate Chroma vector search with LangChain or LlamaIndex?

Who is it for?

When should I use this skill?

What you get

Files

Chroma - Open-Source Embedding Database

When to use Chroma

Quick start

Installation

Basic usage (Python)

Core operations

1. Create collection

2. Add documents

3. Query (similarity search)

4. Get documents

5. Update documents

6. Delete documents

Persistent storage

Embedding functions

Default (Sentence Transformers)

OpenAI

HuggingFace

Custom embedding function

Metadata filtering

LangChain integration

LlamaIndex integration

Server mode

Best practices

Performance

Resources

Chroma Integration Guide

LangChain

LlamaIndex

Resources

Related skills

How it compares

FAQ

Which frameworks does the chroma skill support?

How does chroma handle persistent vector storage?

Is Chroma safe to install?

This week in AI coding