
Checking Freshness
Run a fast SQL freshness check on warehouse tables before trusting dashboards, models, or agent-generated reports.
Overview
Checking Freshness is an agent skill for the Operate phase that verifies warehouse table currency with timestamp discovery SQL and a standardized freshness status scale.
Install
npx skills add https://github.com/astronomer/agents --skill checking-freshnessWhat is this skill?
- 3-step freshness process: discover timestamp columns, MAX() age query, 7-day row-count trend
- Documents common ETL column patterns (_loaded_at, ingestion_time, business event dates)
- Status scale with explicit age bands (Fresh, Acceptable, Stale, Critical)
- INFORMATION_SCHEMA lookup guidance when column names are unknown
- Ready-to-run SQL templates with hours_ago and minutes_ago deltas
- 3-step freshness check process
- 4-tier freshness status scale (Fresh through Critical)
- 7-day row-count activity window in optional trend query
Adoption & trust: 723 installs on skills.sh; 384 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are about to use or publish numbers from a table but do not know when it last loaded or whether the pipeline stopped overnight.
Who is it for?
Indie builders and solo analysts validating mart or event tables in Snowflake-style warehouses before reports, alerts, or agent summaries.
Skip if: Full pipeline root-cause remediation, schema migration planning, or teams that only need business KPI definitions without SQL execution.
When should I use this skill?
User asks if data is up to date, when a table was last updated, if data is stale, or needs to verify data currency before using it.
What do I get? / Deliverables
You get a concrete last-update time, recency in hours or minutes, optional weekly row trends, and a labeled freshness status you can cite before proceeding.
- Last-update timestamp and age in hours or minutes
- Freshness status label with supporting row-count trend when applicable
Recommended Skills
Journey fit
Data currency checks belong on the Operate shelf because stale pipelines and late loads are production reliability questions, not greenfield build work. Monitoring is the canonical subphase for verifying when tables last updated and whether ingestion is still healthy.
How it compares
Use instead of ad-hoc SELECT MAX guesses without a shared staleness rubric or column naming checklist.
Common Questions / FAQ
Who is checking-freshness for?
Solo builders, analytics engineers, and indie SaaS operators who query production or warehouse tables and must confirm data is current before trusting outputs.
When should I use checking-freshness?
Use it in Operate when monitoring ingestion health, in Grow before sharing analytics, or in Build when validating integrations—any time someone asks if data is up to date, stale, or when a table last updated.
Is checking-freshness safe to install?
It is procedural SQL guidance without prescribing destructive commands; review the Security Audits panel on this page and scope read-only queries in your environment.
SKILL.md
READMESKILL.md - Checking Freshness
# Data Freshness Check Quickly determine if data is fresh enough to use. ## Freshness Check Process For each table to check: ### 1. Find the Timestamp Column Look for columns that indicate when data was loaded or updated: - `_loaded_at`, `_updated_at`, `_created_at` (common ETL patterns) - `updated_at`, `created_at`, `modified_at` (application timestamps) - `load_date`, `etl_timestamp`, `ingestion_time` - `date`, `event_date`, `transaction_date` (business dates) Query INFORMATION_SCHEMA.COLUMNS if you need to see column names. ### 2. Query Last Update Time ```sql SELECT MAX(<timestamp_column>) as last_update, CURRENT_TIMESTAMP() as current_time, TIMESTAMPDIFF('hour', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as hours_ago, TIMESTAMPDIFF('minute', MAX(<timestamp_column>), CURRENT_TIMESTAMP()) as minutes_ago FROM <table> ``` ### 3. Check Row Counts by Time For tables with regular updates, check recent activity: ```sql SELECT DATE_TRUNC('day', <timestamp_column>) as day, COUNT(*) as row_count FROM <table> WHERE <timestamp_column> >= DATEADD('day', -7, CURRENT_DATE()) GROUP BY 1 ORDER BY 1 DESC ``` ## Freshness Status Report status using this scale: | Status | Age | Meaning | |--------|-----|---------| | **Fresh** | < 4 hours | Data is current | | **Stale** | 4-24 hours | May be outdated, check if expected | | **Very Stale** | > 24 hours | Likely a problem unless batch job | | **Unknown** | No timestamp | Can't determine freshness | ## If Data is Stale Check Airflow for the source pipeline: 1. **Find the DAG**: Which DAG populates this table? Use `af dags list` and look for matching names. 2. **Check DAG status**: - Is the DAG paused? Use `af dags get <dag_id>` - Did the last run fail? Use `af dags stats` - Is a run currently in progress? 3. **Diagnose if needed**: If the DAG failed, use the **debugging-dags** skill to investigate. ### On Astro If you're running on Astro, you can also: - **DAG history in the Astro UI**: Check the deployment's DAG run history for a visual timeline of recent runs and their outcomes - **Astro alerts for SLA monitoring**: Configure alerts to get notified when DAGs miss their expected completion windows, catching staleness before users report it ### On OSS Airflow - **Airflow UI**: Use the DAGs view and task logs to verify last successful runs and SLA misses ## Output Format Provide a clear, scannable report: ``` FRESHNESS REPORT ================ TABLE: database.schema.table_name Last Update: 2024-01-15 14:32:00 UTC Age: 2 hours 15 minutes Status: Fresh TABLE: database.schema.other_table Last Update: 2024-01-14 03:00:00 UTC Age: 37 hours Status: Very Stale Source DAG: daily_etl_pipeline (FAILED) Action: Investigate with **debugging-dags** skill ``` ## Quick Checks If user just wants a yes/no answer: - "Is X fresh?" -> Check and respond with status + one line - "Can I use X for my 9am meeting?" -> Check and give clear yes/no with context