Crawl4ai

Research-heavy crawling (competitors, market pages, documentation) is where solo builders first reach for a CLI crawler before build-time ingestion pipelines. The crwl CLI is optimized for discovery and extraction during opportunity and competitor research, not for production deploy config.

Also useful

Also useful

IdeaCompetitor & landscape research

Where it fits

Example use

Crawl rival pricing and feature pages to markdown before you commit to a niche.

Example use

Pull official docs into JSON for a side-by-side API comparison memo.

Example use

Archive a target customer’s public site structure to size an MVP integration.

Example use

Run crwl with extraction config to feed an agent’s knowledge base on each release.

Example use

GrowContent & marketing

Refresh crawled competitor blogs into markdown for a weekly positioning digest.

How it compares

CLI skill package for local crwl usage, not a hosted web-scraper MCP or no-code monitor.

Common Questions / FAQ

Who is crawl4ai for?

Indie developers and agent builders who want Crawl4AI’s crwl CLI documented as procedural steps inside Claude Code, Cursor, or similar agents.

When should I use crawl4ai?

Use it in Idea/research to crawl competitors and references, in Validate when grounding a landing or scope doc in live pages, and in Build/integrations when feeding crawled markdown into RAG or automation—always via explicit crwl invocations.

Is crawl4ai safe to install?

Review the Security Audits panel on this Prism page for install risk and dependency signals before running pip install crawl4ai and crawl4ai-setup on your machine.

SKILL.md

READMESKILL.md - Crawl4ai

# Crawl4AI CLI Guide
<!-- Reference: Tier 2 - Command-line interface for Crawl4AI -->

## Table of Contents
<!-- Lines 1-24 -->

- [Crawl4AI CLI Guide](#crawl4ai-cli-guide)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
  - [Basic Usage](#basic-usage)
    - [Quick Example - Advanced Usage](#quick-example---advanced-usage)
  - [Configuration](#configuration)
    - [Browser Configuration](#browser-configuration)
    - [Crawler Configuration](#crawler-configuration)
    - [Extraction Configuration](#extraction-configuration)
  - [Advanced Features](#advanced-features)
    - [LLM Q\&A](#llm-qa)
    - [Structured Data Extraction](#structured-data-extraction)
    - [Content Filtering](#content-filtering)
  - [Output Formats](#output-formats)
  - [Complete Examples](#complete-examples)
  - [Best Practices \& Tips](#best-practices--tips)
  - [Recap](#recap)
  - [See Also](#see-also)

---

## Installation
<!-- Lines 27-38 -->

The Crawl4AI CLI (`crwl`) is installed automatically with the library:

```bash
pip install crawl4ai
crawl4ai-setup
```

---

## Basic Usage
<!-- Lines 39-68 -->

The `crwl` command provides a simple interface to the Crawl4AI library:

```bash
# Basic crawling - returns markdown
crwl https://example.com

# Specify output format
crwl https://example.com -o markdown

# Verbose JSON output with cache bypass
crwl https://example.com -o json -v --bypass-cache

# See usage examples
crwl --example
```

### Quick Example - Advanced Usage
<!-- Lines 59-70 -->

```bash
# Extract structured data using CSS schema
crwl "https://www.infoq.com/ai-ml-data-eng/" \
    -e docs/examples/cli/extract_css.yml \
    -s docs/examples/cli/css_schema.json \
    -o json
```

---

## Configuration
<!-- Lines 51-160 -->

### Browser Configuration
<!-- Lines 51-75 -->

Browser settings via YAML file or command line:

```yaml
# browser.yml
headless: true
viewport_width: 1280
user_agent_mode: "random"
verbose: true
ignore_https_errors: true
```

```bash
# Using config file
crwl https://example.com -B browser.yml

# Using direct parameters
crwl https://example.com -b "headless=true,viewport_width=1280,user_agent_mode=random"
```

**Key Parameters:**

| Parameter | Description |
|-----------|-------------|
| `headless` | Run without GUI (true/false) |
| `viewport_width` | Browser width in pixels |
| `viewport_height` | Browser height in pixels |
| `user_agent_mode` | "random" or specific UA string |

For all browser parameters: [BrowserConfig Reference](complete-sdk-reference.md#1-browserconfig--controlling-the-browser) (lines 1977-2020)

### Crawler Configuration
<!-- Lines 76-110 -->

Control crawling behavior:

```yaml
# crawler.yml
cache_mode: "bypass"
wait_until: "networkidle"
page_timeout: 30000
delay_before_return_html: 0.5
word_count_threshold: 100
scan_full_page: true
scroll_delay: 0.3
process_iframes: false
remove_overlay_elements: true
magic: true
verbose: true
```

```bash
# Using config file
crwl https://example.com -C crawler.yml

# Using direct parameters
crwl https://example.com -c "css_selector=#main,delay_before_return_html=2,scan_full_page=true"
```

**Key Parameters:**

| Parameter | Description |
|-----------|-------------|
| `cache_mode` | bypass, enabled, disabled |
| `wait_until` | networkidle, domcontentloaded |
| `page_timeout` | Max page load time (ms) |
| `css_selector` | Focus on specific element |
| `scan_full_page` | Enable infinite scroll handling |

For all crawler parameters: [CrawlerRunConfig Reference](complete-sdk-reference.md#2-crawlerrunconfig--controlling-each-crawl) (lines 2020-2330)

### Extraction Configuration
<!-- Lines 111-160 -->

Two extraction types supported:

**1. CSS/XPath-based extraction:**

```yaml
# extract_css.yml
type: "json-css"
params:
  verbose: true
```

```json
// css_schema.json
{
  "name": "ArticleExtractor",
  "baseSelector": ".article",
  "fields": [
    {
      "name": "title",
      "selector": "h1.title",
      "type": "text"
    },
    {
      "n

What is this skill?

crwl CLI with markdown, JSON, and verbose cache-bypass modes

Browser, crawler, and extraction configuration sections for tuned runs

LLM Q&A, structured data extraction, and content filtering flows

Output format switches (-o markdown/json) for agent-ready corpora

Complete examples and best-practices section in the skill guide

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 632 installs on skills.sh; 31 GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

IdeaCompetitor & landscape research

Where it fits

Example use

Crawl rival pricing and feature pages to markdown before you commit to a niche.

Example use

Pull official docs into JSON for a side-by-side API comparison memo.

Example use

Archive a target customer’s public site structure to size an MVP integration.

Example use