
Data Context Extractor
Bootstrap or extend a company-specific data analysis agent skill by interviewing analysts and writing reference files for your warehouse.
Overview
data-context-extractor is an agent skill most often used in Build (also Grow, Operate) that interviews analysts and generates or updates company-specific data warehouse skills with reference files.
Install
npx skills add https://github.com/anthropics/knowledge-work-plugins --skill data-context-extractorWhat is this skill?
- Two modes: Bootstrap creates a new warehouse skill; Iteration adds domain reference files
- Phase 1 discovers schemas after you name BigQuery, Snowflake, or similar warehouses
- Structured analyst interviews capture metrics definitions, tribal terminology, and query patterns
- Emits tailored reference files appended to an existing data context skill over time
- Two operational modes: Bootstrap and Iteration
Adoption & trust: 1.7k installs on skills.sh; 19.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent keeps misinterpreting warehouse tables and metrics because company data context lives only in analysts’ heads.
Who is it for?
Solo or indie teams with a live warehouse and analysts willing to run a structured bootstrap or iteration interview.
Skip if: Greenfield products with no database yet, or teams that only need a single ad-hoc query without maintaining a skill package.
When should I use this skill?
Triggers include "Create a data context skill", "Set up data analysis for our warehouse", "Add context about [domain]", or "Update the data skill with [metrics/tables/terminology]" when analysts want Claude to understand
What do I get? / Deliverables
You finish with a bootstrap or updated data analysis skill plus reference files that encode schemas, definitions, and query patterns for repeatable agent use.
- New or updated data analysis SKILL.md
- Domain-specific reference markdown files
- Documented metrics and table semantics for agent queries
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
You first install this while building agent tooling and data workflows, even though the generated skill later supports Grow analytics and Operate reporting. Agent-tooling is the right shelf because the deliverable is a reusable SKILL.md package for Claude, not a one-off SQL query.
Where it fits
Bootstrap a Snowflake-aware analysis skill before you ship an internal metrics copilot.
Iterate reference files when marketing defines new lifecycle metrics the agent must query correctly.
Append table documentation after a production schema migration so weekly reports stay trustworthy.
How it compares
Use this meta generator instead of hand-writing a giant static prompt that drifts every time the schema changes.
Common Questions / FAQ
Who is data-context-extractor for?
Data analysts and technical founders who want Claude to understand their specific warehouse, metrics language, and common query patterns.
When should I use data-context-extractor?
Use Bootstrap mode in Build when standing up a new data skill, and Iteration mode in Build, Grow, or Operate when metrics, domains, or tables change and the skill needs richer reference files.
Is data-context-extractor safe to install?
It may guide warehouse connections and sensitive terminology—review the Security Audits panel on this page and restrict secrets to least-privilege read roles.
SKILL.md
READMESKILL.md - Data Context Extractor
# Data Context Extractor A meta-skill that extracts company-specific data knowledge from analysts and generates tailored data analysis skills. ## How It Works This skill has two modes: 1. **Bootstrap Mode**: Create a new data analysis skill from scratch 2. **Iteration Mode**: Improve an existing skill by adding domain-specific reference files --- ## Bootstrap Mode Use when: User wants to create a new data context skill for their warehouse. ### Phase 1: Database Connection & Discovery **Step 1: Identify the database type** Ask: "What data warehouse are you using?" Common options: - **BigQuery** - **Snowflake** - **PostgreSQL/Redshift** - **Databricks** Use `~~data warehouse` tools (query and schema) to connect. If unclear, check available MCP tools in the current session. **Step 2: Explore the schema** Use `~~data warehouse` schema tools to: 1. List available datasets/schemas 2. Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?") 3. Pull schema details for those key tables Sample exploration queries by dialect: ```sql -- BigQuery: List datasets SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA -- BigQuery: List tables in a dataset SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES` -- Snowflake: List schemas SHOW SCHEMAS IN DATABASE my_database -- Snowflake: List tables SHOW TABLES IN SCHEMA my_schema ``` ### Phase 2: Core Questions (Ask These) After schema discovery, ask these questions conversationally (not all at once): **Entity Disambiguation (Critical)** > "When people here say 'user' or 'customer', what exactly do they mean? Are there different types?" Listen for: - Multiple entity types (user vs account vs organization) - Relationships between them (1:1, 1:many, many:many) - Which ID fields link them together **Primary Identifiers** > "What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?" Listen for: - Primary keys vs business keys - UUID vs integer IDs - Legacy ID systems **Key Metrics** > "What are the 2-3 metrics people ask about most? How is each one calculated?" Listen for: - Exact formulas (ARR = monthly_revenue × 12) - Which tables/columns feed each metric - Time period conventions (trailing 7 days, calendar month, etc.) **Data Hygiene** > "What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)" Listen for: - Standard WHERE clauses to always include - Flag columns that indicate exclusions (is_test, is_internal, is_fraud) - Specific values to exclude (status = 'deleted') **Common Gotchas** > "What mistakes do new analysts typically make with this data?" Listen for: - Confusing column names - Timezone issues - NULL handling quirks - Historical vs current state tables ### Phase 3: Generate the Skill Create a skill with this structure: ``` [company]-data-analyst/ ├── SKILL.md └── references/ ├── entities.md # Entity definitions and relationships ├── metrics.md # KPI calculations ├── tables/ # One file per domain │ ├── [domain1].md │ └──