
Tracing Downstream Lineage
Trace what downstream DAGs, views, and dashboards consume a table or dataset before you change schema or logic in production.
Overview
Tracing Downstream Lineage is an agent skill most often used in Operate (also Ship, Build) that traces downstream data dependencies and change blast radius before you modify tables or DAGs.
Install
npx skills add https://github.com/astronomer/agents --skill tracing-downstream-lineageWhat is this skill?
- Frames the core question: what breaks if I change this target?
- Step 1 identifies direct consumers via DAG source search (`af dags list`, `af dags source`) and SQL patterns
- Table path covers dependent views via information_schema and Snowflake-oriented queries
- Calls out BI-style naming patterns (rpt_, dashboard_) for dashboard dependencies
- Documents Astro UI Lineage tab for visual cross-DAG dataset dependencies when on Astronomer
- Step 1 direct-consumer identification spans DAG source search, dependent views, and BI connection patterns
Adoption & trust: 743 installs on skills.sh; 384 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to change a warehouse table or pipeline task but do not know which DAGs, views, or dashboards will fail when you deploy.
Who is it for?
Solo data engineers on Airflow or Astro who own shared fact tables and must justify schema edits with dependency evidence.
Skip if: Greenfield prototypes with no downstream consumers yet, or teams without access to DAG source or warehouse metadata.
When should I use this skill?
The user asks what depends on this data, what breaks if something changes, downstream dependencies, or needs change-risk assessment before modifying a table or DAG.
What do I get? / Deliverables
You produce a direct-consumer map and change-risk notes so you can schedule migrations, notify owners, or sequence DAG updates safely.
- Direct downstream consumer list (DAGs, views, BI-facing tables)
- Change-risk summary for the proposed modification
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Operate because the skill answers blast-radius and dependency risk for live data assets you maintain in production. Infra subphase fits lineage, dependency graphs, and change-impact analysis across orchestration and warehouse objects.
Where it fits
Before renaming a core fact table, search all DAG sources that SELECT or JOIN from it.
Attach downstream consumer findings to a release checklist so deploy waits on dashboard owner sign-off.
Use Astro Lineage plus SQL view checks to scope an emergency column deprecation.
How it compares
Operational impact playbook for lineage, not a generic code-review skill or a packaged lineage SaaS connector.
Common Questions / FAQ
Who is tracing-downstream-lineage for?
Builders and operators maintaining orchestrated data pipelines who need agent-guided steps to find downstream readers before production changes.
When should I use tracing-downstream-lineage?
In Build while refactoring shared models, in Ship before merging risky DAG or schema changes, and in Operate when assessing incident or migration blast radius on a dataset.
Is tracing-downstream-lineage safe to install?
The skill suggests CLI and SQL inspection against your environment; review the Security Audits panel on this Prism page and scope credentials before automated queries.
SKILL.md
READMESKILL.md - Tracing Downstream Lineage
# Downstream Lineage: Impacts Answer the critical question: "What breaks if I change this?" Use this BEFORE making changes to understand the blast radius. ## Impact Analysis ### Step 1: Identify Direct Consumers Find everything that reads from this target: **For Tables:** 1. **Search DAG source code**: Look for DAGs that SELECT from this table - Use `af dags list` to get all DAGs - Use `af dags source <dag_id>` to search for table references - Look for: `FROM target_table`, `JOIN target_table` 2. **Check for dependent views**: ```sql -- Snowflake SELECT * FROM information_schema.view_table_usage WHERE table_name = '<target_table>' -- Or check SHOW VIEWS and search definitions ``` 3. **Look for BI tool connections**: - Dashboards often query tables directly - Check for common BI patterns in table naming (rpt_, dashboard_) ### On Astro If you're running on Astro, the **Lineage tab** in the Astro UI provides visual dependency graphs across DAGs and datasets, making downstream impact analysis faster. It shows which DAGs consume a given dataset and their current status, reducing the need for manual source code searches. **For DAGs:** 1. **Check what the DAG produces**: Use `af dags source <dag_id>` to find output tables 2. **Then trace those tables' consumers** (recursive) ### Step 2: Build Dependency Tree Map the full downstream impact: ``` SOURCE: fct.orders | +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs | | | +-- TABLE: rpt.monthly_summary --> Email: Monthly Report | +-- TABLE: ml.order_features --> Model: Demand Forecasting | +-- DIRECT: Looker Dashboard "Sales Overview" ``` ### Step 3: Categorize by Criticality **Critical** (breaks production): - Production dashboards - Customer-facing applications - Automated reports to executives - ML models in production - Regulatory/compliance reports **High** (causes significant issues): - Internal operational dashboards - Analyst workflows - Data science experiments - Downstream ETL jobs **Medium** (inconvenient): - Ad-hoc analysis tables - Development/staging copies - Historical archives **Low** (minimal impact): - Deprecated tables - Unused datasets - Test data ### Step 4: Assess Change Risk For the proposed change, evaluate: **Schema Changes** (adding/removing/renaming columns): - Which downstream queries will break? - Are there SELECT * patterns that will pick up new columns? - Which transformations reference the changing columns? **Data Changes** (values, volumes, timing): - Will downstream aggregations still be valid? - Are there NULL handling assumptions that will break? - Will timing changes affect SLAs? **Deletion/Deprecation**: - Full dependency tree must be migrated first - Communication needed for all stakeholders ### Step 5: Find Stakeholders Identify who owns downstream assets: 1. **DAG owners**: Check `owners` field in DAG definitions 2. **Dashboard owners**: Usually in BI tool metadata 3. **Team ownership**: Look for team naming patterns or documentation ## Output: Impact Report ### Summary "Changing `fct.orders` will impact X tables, Y DAGs, and Z dashboards" ### Impact Diagram ``` +--> [agg.daily_sales] --> [Executive Dashboard] | [fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email] | +--> [ml.features] --> [Demand Model] ``` ### Detailed Impacts | Downstream | Type | Criticality | Owner | Notes | |------------|------|-------------|-------|-------| | agg.daily_sales | Table | Critical | data-eng | Updated hourly | | Executive Dashboard | Dashboard | Critical | analytics | CEO views d