Firecrawl Knowledge Base

Canonical shelf is Build because the primary output is structured corpora and mirrors that feed agents, backends, and docs—not one-off competitive snapshots. Agent-tooling is the best fit: outputs are RAG-ready chunks, fine-tuning datasets, and markdown organized for downstream LLM workflows.

Also useful

Also useful

Where it fits

Example use

Map and scrape competitor and category articles into a topic corpus before you commit to a product angle.

Example use

Mirror a framework docs site into .firecrawl paths so your agent retrieves accurate API examples during implementation.

Example use

BuildDocs & content

Generate an offline reference mirror of third-party integration guides bundled with your repo.

Example use

ShipCI/CD & deploy

Refresh public doc mirrors used in release notes or support macros so snippets match the live site.

Example use

Re-crawl vendor changelog pages to update RAG chunks after a breaking API release.

How it compares

Use instead of manual copy-paste or generic browse-and-summarize when you need reproducible, path-organized markdown corpora for agents.

Common Questions / FAQ

Who is firecrawl-knowledge-base for?

Indie and solo builders using Claude Code, Cursor, or Codex who want web sources converted into organized markdown for RAG, local reference, documentation mirrors, or training-style datasets.

When should I use firecrawl-knowledge-base?

During Build when preparing agent context or doc mirrors; in Idea when assembling a topic corpus for discovery; and in Operate when refreshing mirrored vendor docs—whenever you have URLs or a topic and need LLM-ready structure on disk.

Is firecrawl-knowledge-base safe to install?

It requires a Firecrawl API key and network access to hosted Firecrawl; review the Security Audits panel on this Prism page and treat scraped content and API credentials according to your policies.

SKILL.md

READMESKILL.md - Firecrawl Knowledge Base

# Firecrawl Knowledge Base

Use this to turn URLs or topics into organized LLM-ready content.

## Onboarding Interview

Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested.

## Firecrawl Collection Plan

Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables.

For files, follow the Firecrawl download-style convention:

```text
.firecrawl/
  <hostname>/
    <path>/
      index.md
```

## Parallel Work

If appropriate, use sub-agents or equivalent parallel task runners:

- one docs section per researcher
- official docs, tutorials, community discussions, and references by source type
- source scraping vs chunk generation vs manifest generation

## Output Modes

- Reference: markdown files, `index.md`, and `sources.json`.
- RAG: markdown files plus chunk files and `manifest.json`.
- Training: scraped source files plus `training-data.jsonl` and `training-metadata.json`.
- Docs mirror: complete markdown mirror with a table of contents.

## Final Deliverable

```markdown
# Knowledge Base: [Source]

## Summary
[What was collected and why]

## Output Structure
[Files/directories created]

## Coverage
[Sections, source types, counts]

## Usage Notes
[How to use in RAG, docs, training, or agent context]

## Sources
[URLs collected]

## Rerun Inputs
workflow: firecrawl-knowledge-base
source: [url/topic]
goal: [reference/rag/train/docs]
depth: [quick/thorough/exhaustive]
output_dir: [.firecrawl/]
```

## Quality Bar

- Preserve code examples and formatting.
- Remove boilerplate navigation where possible.
- Include source URLs in frontmatter or metadata.

What is this skill?

Plans Firecrawl map, search, and scrape passes with code examples and tables preserved in markdown

Writes files under a .firecrawl/<hostname>/<path>/index.md convention for predictable local corpora

Supports parallel sub-agents per docs section or source type (official docs, tutorials, community)

Short onboarding: infer source, goal, depth, and output path; ask at most 1–3 questions only when blocked

Hosted Firecrawl via required FIRECRAWL_API_KEY

Onboarding asks at most 1–3 concise questions when source or goal is blocked

On-disk layout convention: .firecrawl/<hostname>/<path>/index.md

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 11.6k installs on skills.sh; 29 GitHub stars; 2/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get a Firecrawl-driven collection plan executed into markdown on disk—ready for chunking, RAG ingestion, doc mirrors, or dataset export—with optional parallel section researchers.

Firecrawl collection plan (map/search/scrape strategy)

Organized markdown corpus under .firecrawl-style paths

RAG- or training-ready chunked content when requested

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Map and scrape competitor and category articles into a topic corpus before you commit to a product angle.

Example use

Mirror a framework docs site into .firecrawl paths so your agent retrieves accurate API examples during implementation.

Example use

BuildDocs & content

Generate an offline reference mirror of third-party integration guides bundled with your repo.

Example use

ShipCI/CD & deploy

Refresh public doc mirrors used in release notes or support macros so snippets match the live site.

Example use