
Firecrawl Knowledge Base
Turn documentation sites or topic URLs into organized, LLM-ready markdown and chunk trees for RAG, local reference, or training datasets using Firecrawl.
Overview
Firecrawl Knowledge Base is an agent skill most often used in Build (also Idea, Operate) that turns URLs or topics into organized, LLM-ready markdown and file trees via Firecrawl map, search, and scrape.
Install
npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-baseWhat is this skill?
- Plans Firecrawl map, search, and scrape passes with code examples and tables preserved in markdown
- Writes files under a .firecrawl/<hostname>/<path>/index.md convention for predictable local corpora
- Supports parallel sub-agents per docs section or source type (official docs, tutorials, community)
- Short onboarding: infer source, goal, depth, and output path; ask at most 1–3 questions only when blocked
- Hosted Firecrawl via required FIRECRAWL_API_KEY
- Onboarding asks at most 1–3 concise questions when source or goal is blocked
- On-disk layout convention: .firecrawl/<hostname>/<path>/index.md
Adoption & trust: 11.6k installs on skills.sh; 29 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a structured, local corpus from scattered web docs or topic pages, not copy-paste snippets that break tables, code samples, and section hierarchy.
Who is it for?
Solo builders wiring RAG or fine-tuning data from public docs, API references, or curated topic research before shipping an AI feature.
Skip if: Teams that only need a quick paragraph answer in chat, lack a Firecrawl API key, or cannot accept network scraping of external sites.
When should I use this skill?
Build a knowledge base from web content with Firecrawl for local reference docs, RAG-ready chunks, fine-tuning datasets, documentation mirrors, topic corpora, or LLM-ready markdown.
What do I get? / Deliverables
You get a Firecrawl-driven collection plan executed into markdown on disk—ready for chunking, RAG ingestion, doc mirrors, or dataset export—with optional parallel section researchers.
- Firecrawl collection plan (map/search/scrape strategy)
- Organized markdown corpus under .firecrawl-style paths
- RAG- or training-ready chunked content when requested
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build because the primary output is structured corpora and mirrors that feed agents, backends, and docs—not one-off competitive snapshots. Agent-tooling is the best fit: outputs are RAG-ready chunks, fine-tuning datasets, and markdown organized for downstream LLM workflows.
Where it fits
Map and scrape competitor and category articles into a topic corpus before you commit to a product angle.
Mirror a framework docs site into .firecrawl paths so your agent retrieves accurate API examples during implementation.
Generate an offline reference mirror of third-party integration guides bundled with your repo.
Refresh public doc mirrors used in release notes or support macros so snippets match the live site.
Re-crawl vendor changelog pages to update RAG chunks after a breaking API release.
How it compares
Use instead of manual copy-paste or generic browse-and-summarize when you need reproducible, path-organized markdown corpora for agents.
Common Questions / FAQ
Who is firecrawl-knowledge-base for?
Indie and solo builders using Claude Code, Cursor, or Codex who want web sources converted into organized markdown for RAG, local reference, documentation mirrors, or training-style datasets.
When should I use firecrawl-knowledge-base?
During Build when preparing agent context or doc mirrors; in Idea when assembling a topic corpus for discovery; and in Operate when refreshing mirrored vendor docs—whenever you have URLs or a topic and need LLM-ready structure on disk.
Is firecrawl-knowledge-base safe to install?
It requires a Firecrawl API key and network access to hosted Firecrawl; review the Security Audits panel on this Prism page and treat scraped content and API credentials according to your policies.
SKILL.md
READMESKILL.md - Firecrawl Knowledge Base
# Firecrawl Knowledge Base Use this to turn URLs or topics into organized LLM-ready content. ## Onboarding Interview Infer the source, goal, depth, and output location from context. If the source and goal are clear, proceed immediately. Ask at most 1-3 concise questions only if blocked, such as the source URL/topic, whether the output is reference/RAG/training/docs, or training format if training is requested. ## Firecrawl Collection Plan Use Firecrawl map for documentation sites, search for topic-based corpora, scrape pages into markdown, and preserve code examples and tables. For files, follow the Firecrawl download-style convention: ```text .firecrawl/ <hostname>/ <path>/ index.md ``` ## Parallel Work If appropriate, use sub-agents or equivalent parallel task runners: - one docs section per researcher - official docs, tutorials, community discussions, and references by source type - source scraping vs chunk generation vs manifest generation ## Output Modes - Reference: markdown files, `index.md`, and `sources.json`. - RAG: markdown files plus chunk files and `manifest.json`. - Training: scraped source files plus `training-data.jsonl` and `training-metadata.json`. - Docs mirror: complete markdown mirror with a table of contents. ## Final Deliverable ```markdown # Knowledge Base: [Source] ## Summary [What was collected and why] ## Output Structure [Files/directories created] ## Coverage [Sections, source types, counts] ## Usage Notes [How to use in RAG, docs, training, or agent context] ## Sources [URLs collected] ## Rerun Inputs workflow: firecrawl-knowledge-base source: [url/topic] goal: [reference/rag/train/docs] depth: [quick/thorough/exhaustive] output_dir: [.firecrawl/] ``` ## Quality Bar - Preserve code examples and formatting. - Remove boilerplate navigation where possible. - Include source URLs in frontmatter or metadata.