
Mongodb Query Optimizer
Tune slow MongoDB aggregations and lookups with index-aware pipeline patterns before they hit production limits.
Overview
MongoDB Query Optimizer is an agent skill most often used in Build (also Ship, Operate) that teaches index-aware aggregation and `$lookup` patterns to cut documents moved per pipeline stage.
Install
npx skills add https://github.com/mongodb/agent-skills --skill mongodb-query-optimizerWhat is this skill?
- Prioritize early `$match` and `$project` to shrink working sets before blocking stages
- Covers 100MB per-stage memory limits and when disk spill beats repeated heavy `$sort`/`$group`
- Contrasts unindexed vs indexed `$lookup` with concrete `createIndex` on the foreign join field
- Recommends `$limit` with `$sort`, indexed sorts, and materialized views for hot repeated aggregations
- 100MB memory limit per blocking aggregation stage (e.g. in-memory `$sort`, `$group`)
- `allowDiskUse` defaults to true when the per-stage memory cap is exceeded
Adoption & trust: 1.6k installs on skills.sh; 131 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your aggregation pipelines sort and group too much data in memory, spill to disk, or crawl on `$lookup` because foreign join fields lack indexes.
Who is it for?
Indie SaaS and API builders profiling MongoDB aggregations in Node or other drivers who can change indexes and pipeline order in the same repo.
Skip if: Teams on fully managed query layers with no access to index creation, or workloads that need SQL-specific tuning rather than aggregation pipelines.
When should I use this skill?
You are writing or reviewing MongoDB aggregation pipelines, `$lookup` joins, or index strategy for slow backend queries.
What do I get? / Deliverables
You reorder and narrow pipelines, add targeted indexes on join and sort keys, and choose limits or materialized views so queries stay within MongoDB’s per-stage memory behavior.
- Reordered pipeline with earlier filters and projections
- Index recommendations on local and foreign join fields
- Notes on `$sort`/`$group` memory vs index-backed sort and optional materialized views
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Query and aggregation design is core backend work while you are building data-heavy features. Pipeline stages, `$lookup`, and indexes are implemented in application data layers and repository code.
Where it fits
Refactor a reporting `$lookup` after adding `{ sku: 1 }` on products so order dashboards stop scanning the foreign collection.
Load-test an admin aggregate and move `$match` before `$group` to stay under the 100MB blocking-stage budget.
Investigate nightly job disk spill and decide between a compound index on the sort key versus a materialized summary collection.
How it compares
Use for MongoDB pipeline and index mechanics instead of generic “make queries faster” chat without stage-by-stage reasoning.
Common Questions / FAQ
Who is mongodb-query-optimizer for?
Solo and indie builders shipping MongoDB-backed APIs or analytics who write or review aggregation pipelines and own index strategy on their cluster.
When should I use mongodb-query-optimizer?
During Build when designing aggregations and `$lookup` joins; in Ship when load-testing dashboards; in Operate when production queries spike CPU or trigger disk spill on blocking stages.
Is mongodb-query-optimizer safe to install?
It is documentation-style procedural knowledge with no bundled executables; review the Security Audits panel on this Prism page before adding any skill from the repo to your agent.
SKILL.md
READMESKILL.md - Mongodb Query Optimizer
# Principles Aggregation pipelines process documents through sequential stages. Focus on: - Reducing documents early in the pipeline - Minimizing data moved between stages - Leveraging indexes where possible - Managing memory usage ## Memory limits and disk spilling Blocking stages (such as in-memory `$sort` and `$group`) have a 100MB memory limit per stage. Default behavior when this limit is exceeded is to spill to disk automatically (`allowDiskUse` defaults to `true`). **Better solutions:** - Filter more aggressively early in pipeline - Add indexes to enable `$sort` to use index order - Use `$limit` with `$sort` to reduce the amount of data the sort must process in memory for unindexed sorts - Consider materialized views for repeated aggregations # Optimization Examples These examples are not exhaustive but representative of some common optimization patterns. ## Unindexed $lookup vs. Indexed $lookup **Bad** — No index on the foreign collection's join field: ```javascript db.orders.aggregate([ { $lookup: { from: "products", localField: "productId", foreignField: "sku", // no index on products.sku! as: "product" }} ]) ``` **Good** — Index on `foreignField` in the foreign collection: ```javascript db.products.createIndex({ sku: 1 }) db.orders.aggregate([ { $lookup: { from: "products", localField: "productId", foreignField: "sku", as: "product" }} ]) ``` **Why:** Each `$lookup` executes a find on the `from` collection. Without an index on `foreignField`, every join does a full collection scan. This is the single most critical $lookup optimization. ## Early $project Defeating Optimization vs. Late $project **Bad** — Early `$project` prevents the optimizer from pruning unused fields, forgets to exclude `_id` which is unneeded, and includes `name` which is not used: ```javascript db.collection.aggregate([ { $project: { name: 1, status: 1, amount: 1 } }, { $match: { status: "active" } }, { $group: { _id: "$status", total: { $sum: "$amount" } } } ]) ``` **Good** — Let the optimizer handle field pruning; use `$project` only at the end for reshaping: ```javascript db.collection.aggregate([ { $match: { status: "active" } }, { $group: { _id: "$status", total: { $sum: "$amount" } } }, { $project: { _id: 0, status: "$_id", total: 1 } } // reshape at the end ]) ``` **Why:** MongoDB's pipeline optimizer automatically analyzes which fields are used and avoids fetching unused ones. An early `$project` defeats this optimization, and can inadvertently request the wrong fields. ## $facet for Divergent Processing vs. $unionWith **Bad** — `$facet` sends all documents to every branch, even if branches need very different subsets: ```javascript db.collection.aggregate([ { $facet: { "top10": [{ $sort: { score: -1 } }, { $limit: 10 }], "totalCount": [{ $count: "n" }] // gets ALL docs even though it's just counting }} ]) ``` **Good** — Separate pipelines via `$unionWith` let each branch optimize independently: ```javascript db.collection.aggregate([ { $sort: { score: -1 } }, { $limit: 10 }, { $unionWith: { coll: "collection", pipeline: [{ $count: "n" }] }} ]) ``` **Why:** `$facet` funnels every document into every branch. `$unionWith` runs independent pipelines that each benefit from their own index usage and optimization. ## $sort \+ $limit as Separate Concerns vs. Top-N Sort **Bad** — Large sort, then limit (MongoDB may sort entire dataset): ```javascript db.collection.aggregate([ { $group: { _id: "$category", total: { $sum: "$amount" } } }, { $sort: { total: -1 } }, // ... many stages later ... { $limit: 10 } ]) ``` **Good** — Place `$limit` immediately after `$sort`: ```javascript db.collection.aggregate([ { $group: { _id: "$category", total: { $sum: "$amount" } } }, { $sort: { total: -1 } }, { $limit: 10 } ]) ``` **Why:** When `$sort` is immediately followed by `$limit`, MongoDB performs a *top-N