
Chroma
Wire Chroma vector storage and similarity search into LangChain or LlamaIndex RAG pipelines from your coding agent.
Overview
chroma is an agent skill for the Build phase that integrates the Chroma embedding database with LangChain and LlamaIndex for RAG retrieval.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill chromaWhat is this skill?
- LangChain Chroma.from_documents with persist_directory and similarity_search
- LlamaIndex ChromaVectorStore with PersistentClient collections
- Vector plus full-text search with metadata filtering for RAG
- Simple four-function API scaling from notebooks to clusters
- Open-source self-hosted path via chromadb and sentence-transformers
- Documented simple 4-function Chroma API surface
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have chunked documents and an LLM but no persisted vector index or framework-specific Chroma wiring in your repo.
Who is it for?
Indie builders shipping self-hosted RAG on a budget who want copy-paste Chroma setup across popular LLM frameworks.
Skip if: Teams that require managed proprietary vector SaaS only or production multi-tenant ops without self-hosting Chroma.
When should I use this skill?
User needs semantic search, RAG, document retrieval, metadata filtering, or Chroma with LangChain/LlamaIndex.
What do I get? / Deliverables
You get working LangChain and LlamaIndex snippets with persistent Chroma collections and similarity_search or retriever hooks.
- LangChain Chroma vectorstore and retriever code
- LlamaIndex ChromaVectorStore initialization snippet
Recommended Skills
Journey fit
How it compares
Skill package for Chroma client integration—not a hosted MCP server or a full evaluation harness for retrieval quality.
Common Questions / FAQ
Who is chroma for?
Solo developers building RAG features who want agent-guided Chroma setup with LangChain or LlamaIndex.
When should I use chroma?
Use in Build/backend when adding semantic search, document retrieval, or metadata-filtered vector stores to an AI product.
Is chroma safe to install?
It pulls open-source dependencies; review the Security Audits panel on this Prism page and pin chromadb versions in production.
SKILL.md
READMESKILL.md - Chroma
# Chroma Integration Guide Integration with LangChain, LlamaIndex, and frameworks. ## LangChain ```python from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings vectorstore = Chroma.from_documents( documents=docs, embedding=OpenAIEmbeddings(), persist_directory="./chroma_db" ) # Query results = vectorstore.similarity_search("query", k=3) # As retriever retriever = vectorstore.as_retriever() ``` ## LlamaIndex ```python from llama_index.vector_stores.chroma import ChromaVectorStore import chromadb db = chromadb.PersistentClient(path="./chroma_db") collection = db.get_or_create_collection("docs") vector_store = ChromaVectorStore(chroma_collection=collection) ``` ## Resources - **Docs**: https://docs.trychroma.com --- name: chroma description: Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects. version: 1.0.0 author: Orchestra Research license: MIT tags: [RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering] dependencies: [chromadb, sentence-transformers] --- # Chroma - Open-Source Embedding Database The AI-native database for building LLM applications with memory. ## When to use Chroma **Use Chroma when:** - Building RAG (retrieval-augmented generation) applications - Need local/self-hosted vector database - Want open-source solution (Apache 2.0) - Prototyping in notebooks - Semantic search over documents - Storing embeddings with metadata **Metrics**: - **24,300+ GitHub stars** - **1,900+ forks** - **v1.3.3** (stable, weekly releases) - **Apache 2.0 license** **Use alternatives instead**: - **Pinecone**: Managed cloud, auto-scaling - **FAISS**: Pure similarity search, no metadata - **Weaviate**: Production ML-native database - **Qdrant**: High performance, Rust-based ## Quick start ### Installation ```bash # Python pip install chromadb # JavaScript/TypeScript npm install chromadb @chroma-core/default-embed ``` ### Basic usage (Python) ```python import chromadb # Create client client = chromadb.Client() # Create collection collection = client.create_collection(name="my_collection") # Add documents collection.add( documents=["This is document 1", "This is document 2"], metadatas=[{"source": "doc1"}, {"source": "doc2"}], ids=["id1", "id2"] ) # Query results = collection.query( query_texts=["document about topic"], n_results=2 ) print(results) ``` ## Core operations ### 1. Create collection ```python # Simple collection collection = client.create_collection("my_docs") # With custom embedding function from chromadb.utils import embedding_functions openai_ef = embedding_functions.OpenAIEmbeddingFunction( api_key="your-key", model_name="text-embedding-3-small" ) collection = client.create_collection( name="my_docs", embedding_function=openai_ef ) # Get existing collection collection = client.get_collection("my_docs") # Delete collection client.delete_collection("my_docs") ``` ### 2. Add documents ```python # Add with auto-generated IDs collection.add( documents=["Doc 1", "Doc 2", "Doc 3"], metadatas=[ {"source": "web", "category": "tutorial"}, {"source": "pdf", "page": 5}, {"source": "api", "timestamp": "2025-01-01"} ], ids=["id1", "id2", "id3"] ) # Add with custom embeddings collection.add( embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]], documents=["Doc 1", "Doc 2"], ids=["id1", "id2"] ) ``` ### 3. Query (similarity search) ```python # Basic query results = collection.query( query_texts=["machine learning tutorial"], n_results=5 ) # Query with filters results = collection.query( query_texts=["Python programming"],