
Mongodb Schema Design
Guide document schema and indexing choices so hot-path queries avoid costly repeated `$lookup` joins on MongoDB.
Overview
MongoDB Schema Design is an agent skill most often used in Build (also Operate) that helps you cut hot-path query cost by replacing habitual `$lookup` joins with thoughtful embedding and denormalization.
Install
npx skills add https://github.com/mongodb/agent-skills --skill mongodb-schema-designWhat is this skill?
- CRITICAL-impact rule: flag excessive `$lookup` on hot paths as an over-normalization anti-pattern
- Side-by-side incorrect multi-`$lookup` product page pipeline vs embed/denormalize correct pattern
- Explicit guidance to measure cardinality, index support, and result size before embedding
- Atlas-oriented tags in source material for schema and join-cost review
- Teaches when related category/brand fields should live inside product documents
- Tagged CRITICAL impact in source rule metadata
Adoption & trust: 1.7k installs on skills.sh; 131 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your MongoDB read pipelines keep chaining `$lookup` stages and latency grows every time you add another related collection to the page.
Who is it for?
Backend-focused indie devs shipping document APIs on MongoDB or Atlas who see join-heavy aggregations on every user-facing read.
Skip if: Greenfield SQL-only stacks or teams that rarely aggregate across collections and already validate performance with representative load tests.
When should I use this skill?
Designing or reviewing MongoDB schemas and aggregation pipelines where related data is frequently read together and `$lookup` appears on hot paths.
What do I get? / Deliverables
You choose embed-vs-reference boundaries with measured justification so common reads hit one collection instead of multi-stage joins.
- Embed-vs-lookup decision for modeled entities
- Revised aggregation shape avoiding redundant joins
- Documented rationale tied to cardinality and indexes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Schema embedding vs referencing is decided when modeling backend data structures, though the same rules matter again when tuning production query cost. MongoDB collection design, aggregation pipelines, and denormalization tradeoffs are core backend data modeling, not frontend or launch distribution work.
Where it fits
Map product, category, and brand reads into one document shape before shipping the catalog API.
Audit staging aggregations for chained lookups before load testing the release candidate.
Refactor a slow production pipeline after metrics show multi-collection joins on every product detail request.
How it compares
Use as a schema review lens alongside raw MongoDB docs—opinionated anti-patterns for join frequency, not a generic CRUD tutorial.
Common Questions / FAQ
Who is mongodb-schema-design for?
Solo builders and small teams modeling MongoDB collections for SaaS or API backends who need clear guidance on when `$lookup` is too much.
When should I use mongodb-schema-design?
In Build → backend while designing schemas or aggregation APIs, and again in Operate → iterate when profiling shows repeated lookups on production hot paths.
Is mongodb-schema-design safe to install?
It is advisory schema content with example pipelines only; check the Security Audits panel on this page and never run untrusted example code against production without review.
SKILL.md
READMESKILL.md - Mongodb Schema Design
## Reduce Excessive $lookup Usage **Frequent $lookup operations on hot paths can indicate over-normalization.** `$lookup` is useful, but repeated joins can be slower and more resource-intensive than querying a single collection, especially when supporting indexes or match selectivity are weak. If the same related fields are read together often, consider embedding or extended references. **Incorrect (constant $lookup for common operations):** ```javascript // Every product page requires repeated joins across collections db.products.aggregate([ { $match: { _id: productId } }, { $lookup: { from: "categories", // Collection scan #2 localField: "categoryId", foreignField: "_id", as: "category" }}, { $lookup: { from: "brands", // Collection scan #3 localField: "brandId", foreignField: "_id", as: "brand" }}, { $unwind: "$category" }, { $unwind: "$brand" } ]) // Multiple join stages add planning/execution overhead on hot paths ``` Join cost depends on cardinality, stage order, index support, and result size. Measure before deciding to embed. **Correct (denormalize frequently-joined data):** Embed data that is always displayed alongside the product directly in the product document: include category fields (`_id`, `name`, `path`) and brand fields (`_id`, `name`, `logo`) as subdocuments. A single indexed query returns complete product data without `$lookup`. Listing queries (e.g. by category) also run against a single collection. **Managing denormalized data updates:** When category data changes (a rare event), use `updateMany` to update all products matching that category’s `_id` with the new field values. For frequently-changing data, keep both a reference ID (`brandId`) and a cache subdocument (`brandCache`) with a `cachedAt` timestamp; refresh the cache when it exceeds a staleness threshold. **When NOT to use this pattern:** - **Data changes frequently and independently**: If brand logos change daily, denormalization creates update overhead. - **Rarely-accessed data**: Don't embed review details if only a small fraction of product views load reviews. - **Many-to-many with high cardinality**: Avoid embedding large or fast-growing relationship sets. - **Analytics queries**: Batch jobs can afford $lookup latency; real-time queries cannot. ## Verify with ```javascript // Find pipelines with multiple $lookup stages db.setProfilingLevel(1, { slowms: 50 }) // Disable afterwards db.system.profile.find({ "command.aggregate": { $exists: true }, "command.pipeline.$lookup": { $exists: true } }).sort({ millis: -1 }) // Check if $lookup foreign fields are indexed db.reviews.aggregate([ { $indexStats: {} } ]) // Look for index supporting the query in result // Measure $lookup impact db.products.aggregate([ { $match: { category: "electronics" } }, { $lookup: { from: "brands", localField: "brandId", foreignField: "_id", as: "brand" } } ]).explain("executionStats") // Check totalDocsExamined in $lookup stage ``` Atlas Schema Suggestions flags: "Reduce $lookup operations" Reference: [Reduce Lookup Operations](https://mongodb.com/docs/manual/data-modeling/design-antipatterns/reduce-lookup-operations/) --- title: Reduce Unnecessary Collections impact: CRITICAL impactDescription: "Reduces avoidable joins when related data is repeatedly queried together" tags: schema, collections, anti-pattern, embedding, normalization, atlas-suggestion --- ## Reduce Unnecessary Collections **Collection count alone is not the anti-pattern.** The anti-pattern is using collections as a substitute for indexes — creating one collection per category, time period, or partition key instead of indexing a single collection. Eve