Data Context Extractor

You first install this while building agent tooling and data workflows, even though the generated skill later supports Grow analytics and Operate reporting. Agent-tooling is the right shelf because the deliverable is a reusable SKILL.md package for Claude, not a one-off SQL query.

Also useful

Also useful

Where it fits

Example use

Bootstrap a Snowflake-aware analysis skill before you ship an internal metrics copilot.

Example use

Iterate reference files when marketing defines new lifecycle metrics the agent must query correctly.

Example use

Append table documentation after a production schema migration so weekly reports stay trustworthy.

How it compares

Use this meta generator instead of hand-writing a giant static prompt that drifts every time the schema changes.

Common Questions / FAQ

Who is data-context-extractor for?

Data analysts and technical founders who want Claude to understand their specific warehouse, metrics language, and common query patterns.

When should I use data-context-extractor?

Use Bootstrap mode in Build when standing up a new data skill, and Iteration mode in Build, Grow, or Operate when metrics, domains, or tables change and the skill needs richer reference files.

Is data-context-extractor safe to install?

It may guide warehouse connections and sensitive terminology—review the Security Audits panel on this page and restrict secrets to least-privilege read roles.

SKILL.md

READMESKILL.md - Data Context Extractor

# Data Context Extractor

A meta-skill that extracts company-specific data knowledge from analysts and generates tailored data analysis skills.

## How It Works

This skill has two modes:

1. **Bootstrap Mode**: Create a new data analysis skill from scratch
2. **Iteration Mode**: Improve an existing skill by adding domain-specific reference files

---

## Bootstrap Mode

Use when: User wants to create a new data context skill for their warehouse.

### Phase 1: Database Connection & Discovery

**Step 1: Identify the database type**

Ask: "What data warehouse are you using?"

Common options:
- **BigQuery**
- **Snowflake**
- **PostgreSQL/Redshift**
- **Databricks**

Use `~~data warehouse` tools (query and schema) to connect. If unclear, check available MCP tools in the current session.

**Step 2: Explore the schema**

Use `~~data warehouse` schema tools to:
1. List available datasets/schemas
2. Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")
3. Pull schema details for those key tables

Sample exploration queries by dialect:
```sql
-- BigQuery: List datasets
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: List tables in a dataset
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: List schemas
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: List tables
SHOW TABLES IN SCHEMA my_schema
```

### Phase 2: Core Questions (Ask These)

After schema discovery, ask these questions conversationally (not all at once):

**Entity Disambiguation (Critical)**
> "When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"

Listen for:
- Multiple entity types (user vs account vs organization)
- Relationships between them (1:1, 1:many, many:many)
- Which ID fields link them together

**Primary Identifiers**
> "What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"

Listen for:
- Primary keys vs business keys
- UUID vs integer IDs
- Legacy ID systems

**Key Metrics**
> "What are the 2-3 metrics people ask about most? How is each one calculated?"

Listen for:
- Exact formulas (ARR = monthly_revenue × 12)
- Which tables/columns feed each metric
- Time period conventions (trailing 7 days, calendar month, etc.)

**Data Hygiene**
> "What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"

Listen for:
- Standard WHERE clauses to always include
- Flag columns that indicate exclusions (is_test, is_internal, is_fraud)
- Specific values to exclude (status = 'deleted')

**Common Gotchas**
> "What mistakes do new analysts typically make with this data?"

Listen for:
- Confusing column names
- Timezone issues
- NULL handling quirks
- Historical vs current state tables

### Phase 3: Generate the Skill

Create a skill with this structure:

```
[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # Entity definitions and relationships
    ├── metrics.md           # KPI calculations
    ├── tables/              # One file per domain
    │   ├── [domain1].md
    │   └──

What is this skill?

Two modes: Bootstrap creates a new warehouse skill; Iteration adds domain reference files

Phase 1 discovers schemas after you name BigQuery, Snowflake, or similar warehouses

Structured analyst interviews capture metrics definitions, tribal terminology, and query patterns

Emits tailored reference files appended to an existing data context skill over time

Two operational modes: Bootstrap and Iteration

Compatible agents: Claude Code, Cursor, Codex

Adoption & trust: 1.7k installs on skills.sh; 19.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Bootstrap a Snowflake-aware analysis skill before you ship an internal metrics copilot.

Example use

Iterate reference files when marketing defines new lifecycle metrics the agent must query correctly.

Example use