
Firecrawl Knowledge Ingest
Crawl and extract JS-heavy or login-gated documentation portals into structured markdown or JSON for agent context, support bots, or internal knowledge bases.
Overview
Firecrawl Knowledge Ingest is an agent skill for the Build phase that ingests public or authenticated documentation portals with Firecrawl browser into structured markdown or JSON.
Install
npx skills add https://github.com/firecrawl/firecrawl-workflows --skill firecrawl-knowledge-ingestWhat is this skill?
- Firecrawl browser for auth, pagination, load-more, and JS-rendered doc portals
- Firecrawl map as supplement for public URLs when browser navigation is not required
- Extracts article markdown plus metadata: title, section, dates, author, tags
- Onboarding asks at most 1–3 questions on portal URL, auth, or output format when blocked
- Supports structured JSON or markdown deliverables from documentation sites
- At most 1–3 onboarding questions when portal context is blocked
Adoption & trust: 11.5k installs on skills.sh; 29 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your target docs site needs login, infinite scroll, or client-side rendering, so you cannot reliably pull a complete knowledge base for your agent or product.
Who is it for?
Indie builders shipping agents, internal tools, or support assistants that must stay synced with third-party or your own JS-heavy docs portals.
Skip if: Static single-page READMEs you can copy manually, or ingest jobs where you cannot legally authenticate or scrape the portal.
When should I use this skill?
User needs to ingest public or authenticated knowledge bases and docs portals with Firecrawl browser for JS-heavy docs, login-gated portals, paginated help centers, or structured extraction.
What do I get? / Deliverables
You get a structured documentation export with navigated articles, markdown content, and metadata ready to index, commit, or feed into agent context pipelines.
- Structured markdown or JSON documentation export
- Per-article metadata (title, section, dates, tags)
- Navigated article corpus from portal traversal
Recommended Skills
Journey fit
Canonical shelf is Build because the skill turns external documentation into ingestible artifacts you wire into products, agents, or dev workflows—not a one-time market research pass. Docs is the right subphase: help centers, API references, and paginated portals are documentation assets you normalize for RAG, onboarding, or codegen context.
How it compares
Use when Firecrawl browser beats plain sitemap crawlers for auth-gated or paginated help centers—not as a substitute for official API/SDK importers when those exist.
Common Questions / FAQ
Who is firecrawl-knowledge-ingest for?
Solo developers and small teams building AI features or doc-backed workflows who need reliable extraction from real-world documentation portals.
When should I use firecrawl-knowledge-ingest?
Use it during Build when onboarding agents on vendor APIs, mirroring a help center for a support bot, or freezing partner docs before an integration sprint.
Is firecrawl-knowledge-ingest safe to install?
Check the Security Audits panel on this page; storing portal credentials and API keys requires your secret hygiene, and scraping gated content must match the site’s terms and your access rights.
SKILL.md
READMESKILL.md - Firecrawl Knowledge Ingest
# Firecrawl Knowledge Ingest Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering. ## Onboarding Interview Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately. Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format. ## Firecrawl Collection Plan Use Firecrawl browser to: - open the portal and inspect navigation - identify sections, categories, sidebar links, and article URLs - follow sidebar navigation, next links, pagination, load-more controls, or search - scrape article content as markdown - extract metadata such as title, section, last updated date, author, and tags Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content. ## Final Deliverable ```markdown # Knowledge Ingest: [Portal] ## Summary [Pages extracted, sections covered, limitations] ## Output [JSON/markdown/merged file path or content] ## Sections [Section names and article counts] ## Failed Or Restricted Pages [Any access/loading issues] ## Sources [URLs extracted] ## Rerun Inputs workflow: firecrawl-knowledge-ingest url: [portal url] format: [json/markdown/merged] max_pages: [number] ``` ## JSON Shape Use `source`, `url`, `extractedAt`, `totalArticles`, and `sections[]` with article `title`, `url`, `section`, `content`, and `metadata`. ## Quality Bar - Preserve code examples, tables, and formatting. - Strip nav chrome, headers, and footers. - Track extraction progress and page failures. - Respect authentication boundaries.