Mongodb Query Optimizer

Query and aggregation design is core backend work while you are building data-heavy features. Pipeline stages, `$lookup`, and indexes are implemented in application data layers and repository code.

Also useful

Also useful

Where it fits

Example use

Refactor a reporting `$lookup` after adding `{ sku: 1 }` on products so order dashboards stop scanning the foreign collection.

Example use

Load-test an admin aggregate and move `$match` before `$group` to stay under the 100MB blocking-stage budget.

Example use

Investigate nightly job disk spill and decide between a compound index on the sort key versus a materialized summary collection.

How it compares

Use for MongoDB pipeline and index mechanics instead of generic “make queries faster” chat without stage-by-stage reasoning.

Common Questions / FAQ

Who is mongodb-query-optimizer for?

Solo and indie builders shipping MongoDB-backed APIs or analytics who write or review aggregation pipelines and own index strategy on their cluster.

When should I use mongodb-query-optimizer?

During Build when designing aggregations and `$lookup` joins; in Ship when load-testing dashboards; in Operate when production queries spike CPU or trigger disk spill on blocking stages.

Is mongodb-query-optimizer safe to install?

It is documentation-style procedural knowledge with no bundled executables; review the Security Audits panel on this Prism page before adding any skill from the repo to your agent.

SKILL.md

READMESKILL.md - Mongodb Query Optimizer

# Principles

Aggregation pipelines process documents through sequential stages. Focus on:

- Reducing documents early in the pipeline
- Minimizing data moved between stages
- Leveraging indexes where possible
- Managing memory usage

## Memory limits and disk spilling

Blocking stages (such as in-memory `$sort` and `$group`) have a 100MB memory limit per stage. Default behavior when this limit is exceeded is to spill to disk automatically (`allowDiskUse` defaults to `true`).

**Better solutions:**

- Filter more aggressively early in pipeline
- Add indexes to enable `$sort` to use index order
- Use `$limit` with `$sort` to reduce the amount of data the sort must process in memory for unindexed sorts
- Consider materialized views for repeated aggregations

# Optimization Examples

These examples are not exhaustive but representative of some common optimization patterns.

## Unindexed $lookup vs. Indexed $lookup

**Bad** — No index on the foreign collection's join field:

```javascript
db.orders.aggregate([
  { $lookup: {
      from: "products",
      localField: "productId",
      foreignField: "sku",   // no index on products.sku!
      as: "product"
  }}
])
```

**Good** — Index on `foreignField` in the foreign collection:

```javascript
db.products.createIndex({ sku: 1 })

db.orders.aggregate([
  { $lookup: {
      from: "products",
      localField: "productId",
      foreignField: "sku",
      as: "product"
  }}
])
```

**Why:** Each `$lookup` executes a find on the `from` collection. Without an index on `foreignField`, every join does a full collection scan. This is the single most critical $lookup optimization.

## Early $project Defeating Optimization vs. Late $project

**Bad** — Early `$project` prevents the optimizer from pruning unused fields, forgets to exclude `_id` which is unneeded, and includes `name` which is not used:

```javascript
db.collection.aggregate([
  { $project: { name: 1, status: 1, amount: 1 } },
  { $match: { status: "active" } },
  { $group: { _id: "$status", total: { $sum: "$amount" } } }
])
```

**Good** — Let the optimizer handle field pruning; use `$project` only at the end for reshaping:

```javascript
db.collection.aggregate([
  { $match: { status: "active" } },
  { $group: { _id: "$status", total: { $sum: "$amount" } } },
  { $project: { _id: 0, status: "$_id", total: 1 } }  // reshape at the end
])
```

**Why:** MongoDB's pipeline optimizer automatically analyzes which fields are used and avoids fetching unused ones. An early `$project` defeats this optimization, and can inadvertently request the wrong fields.

## $facet for Divergent Processing vs. $unionWith

**Bad** — `$facet` sends all documents to every branch, even if branches need very different subsets:

```javascript
db.collection.aggregate([
  { $facet: {
      "top10": [{ $sort: { score: -1 } }, { $limit: 10 }],
      "totalCount": [{ $count: "n" }]  // gets ALL docs even though it's just counting
  }}
])
```

**Good** — Separate pipelines via `$unionWith` let each branch optimize independently:

```javascript
db.collection.aggregate([
  { $sort: { score: -1 } }, { $limit: 10 },
  { $unionWith: {
      coll: "collection",
      pipeline: [{ $count: "n" }]
  }}
])
```

**Why:** `$facet` funnels every document into every branch. `$unionWith` runs independent pipelines that each benefit from their own index usage and optimization.

## $sort \+ $limit as Separate Concerns vs. Top-N Sort

**Bad** — Large sort, then limit (MongoDB may sort entire dataset):

```javascript
db.collection.aggregate([
  { $group: { _id: "$category", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } },
  // ... many stages later ...
  { $limit: 10 }
])
```

**Good** — Place `$limit` immediately after `$sort`:

```javascript
db.collection.aggregate([
  { $group: { _id: "$category", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } },
  { $limit: 10 }
])
```

**Why:** When `$sort` is immediately followed by `$limit`, MongoDB performs a *top-N

What is this skill?

Prioritize early `$match` and `$project` to shrink working sets before blocking stages

Covers 100MB per-stage memory limits and when disk spill beats repeated heavy `$sort`/`$group`

Contrasts unindexed vs indexed `$lookup` with concrete `createIndex` on the foreign join field

Recommends `$limit` with `$sort`, indexed sorts, and materialized views for hot repeated aggregations

100MB memory limit per blocking aggregation stage (e.g. in-memory `$sort`, `$group`)

`allowDiskUse` defaults to true when the per-stage memory cap is exceeded

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.6k installs on skills.sh; 131 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You reorder and narrow pipelines, add targeted indexes on join and sort keys, and choose limits or materialized views so queries stay within MongoDB’s per-stage memory behavior.

Reordered pipeline with earlier filters and projections

Index recommendations on local and foreign join fields

Notes on `$sort`/`$group` memory vs index-backed sort and optional materialized views

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Query and aggregation design is core backend work while you are building data-heavy features. Pipeline stages, `$lookup`, and indexes are implemented in application data layers and repository code.

Also useful

Also useful

Where it fits

Example use

Refactor a reporting `$lookup` after adding `{ sku: 1 }` on products so order dashboards stop scanning the foreign collection.

Example use

Load-test an admin aggregate and move `$match` before `$group` to stay under the 100MB blocking-stage budget.

Example use