Profiling Tables

Name: Profiling Tables
Author: astronomer

astronomer/agents

874 installs
412 repo stars
Updated July 27, 2026
astronomer/agents

profiling-tables is an Astronomer agent skill that runs INFORMATION_SCHEMA and statistical SQL to produce comprehensive table profiles for developers onboarding to unfamiliar warehouse datasets.

About

profiling-tables is an astronomer/agents skill that generates a deep statistical profile of one database table so a new engineer can understand structure, content, and data quality quickly. Step one queries INFORMATION_SCHEMA.COLUMNS for column names, data types, and comments, resolving unqualified table names via INFORMATION_SCHEMA.TABLES. Subsequent steps compute distribution stats, null rates, and quality signals tailored to the target warehouse. Developers invoke profiling-tables when joining a project, auditing an ETL source, or answering data-quality questions before modeling. The skill requires an explicit table name and produces onboarding-ready documentation from live SQL results.

Generates profiles a new team member could use to understand the data
Queries INFORMATION_SCHEMA for column metadata and table location
Computes size, shape, min/max/avg/std/median, null rates and distinct counts for numeric columns
Computes length statistics, distinct values and sample patterns for string columns
Uses run_sql tool to execute all queries safely inside the data environment

Profiling Tables by the numbers

874 all-time installs (skills.sh)
+15 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #323 of 2,066 Data Science & ML skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

npx skills add https://github.com/astronomer/agents --skill profiling-tables

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/astronomer/agents/profiling-tables.svg)](https://skillselion.com/skills/astronomer/agents/profiling-tables)

Installs	874
repo stars	★ 412
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	astronomer/agents ↗

How do you profile an unfamiliar database table?

Automatically generate comprehensive statistical profiles of any database table when working with unfamiliar datasets.

Who is it for?

Data engineers and analysts joining a project who must understand an unknown warehouse table before modeling or ETL.

Skip if: Whole-database lineage mapping or automated anomaly alerting across hundreds of tables without a named target.

When should I use this skill?

A developer asks to profile a table, assess data quality, or understand an unfamiliar dataset schema.

What you get

Column metadata report, distribution statistics, data-quality summary, and onboarding-ready table documentation.

Table metadata report
Statistical profile
Data quality summary

By the numbers

Profiles one specific table per invocation via INFORMATION_SCHEMA queries

Files

SKILL.mdMarkdownGitHub ↗

Data Profile

Generate a comprehensive profile of a table that a new team member could use to understand the data.

Step 1: Basic Metadata

Query column metadata:

SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION

If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.

Step 2: Size and Shape

Run via run_sql:

SELECT
    COUNT(*) as total_rows,
    COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>

Step 3: Column-Level Statistics

For each column, gather appropriate statistics based on data type:

Numeric Columns

SELECT
    MIN(column_name) as min_val,
    MAX(column_name) as max_val,
    AVG(column_name) as avg_val,
    STDDEV(column_name) as std_dev,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
    COUNT(DISTINCT column_name) as distinct_count
FROM <table>

String Columns

SELECT
    MIN(LEN(column_name)) as min_length,
    MAX(LEN(column_name)) as max_length,
    AVG(LEN(column_name)) as avg_length,
    SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
    COUNT(DISTINCT column_name) as distinct_count
FROM <table>

Date/Timestamp Columns

SELECT
    MIN(column_name) as earliest,
    MAX(column_name) as latest,
    DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>

Step 4: Cardinality Analysis

For columns that look like categorical/dimension keys:

SELECT
    column_name,
    COUNT(*) as frequency,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20

This reveals:

High-cardinality columns (likely IDs or unique values)
Low-cardinality columns (likely categories or status fields)
Skewed distributions (one value dominates)

Step 5: Sample Data

Get representative rows:

SELECT *
FROM <table>
LIMIT 10

If the table is large and you want variety, sample from different time periods or categories.

Step 6: Data Quality Assessment

Summarize quality across dimensions:

Completeness

Which columns have NULLs? What percentage?
Are NULLs expected or problematic?

Uniqueness

Does the apparent primary key have duplicates?
Are there unexpected duplicate rows?

Freshness

When was data last updated? (MAX of timestamp columns)
Is the update frequency as expected?

Validity

Are there values outside expected ranges?
Are there invalid formats (dates, emails, etc.)?
Are there orphaned foreign keys?

Consistency

Do related columns make sense together?
Are there logical contradictions?

Step 7: Output Summary

Provide a structured profile:

Overview

2-3 sentences describing what this table contains, who uses it, and how fresh it is.

Schema

Column	Type	Nulls%	Distinct	Description
...	...	...	...	...

Key Statistics

Row count: X
Date range: Y to Z
Last updated: timestamp

Data Quality Score

Completeness: X/10
Uniqueness: X/10
Freshness: X/10
Overall: X/10

Potential Issues

List any data quality concerns discovered.

Recommended Queries

3-5 useful queries for common questions about this data.

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

How it compares

Use profiling-tables for single-table onboarding profiles; use broader lineage tools when mapping entire schemas or pipelines.

FAQ

What SQL does profiling-tables run first?

profiling-tables starts by querying INFORMATION_SCHEMA.COLUMNS for COLUMN_NAME, DATA_TYPE, and COMMENT filtered by schema and table. If the table name is not fully qualified, it searches INFORMATION_SCHEMA.TABLES to locate the correct schema.

When should developers use profiling-tables?

profiling-tables fits requests to profile a specific table, understand data quality, or onboard to an unfamiliar dataset. The skill needs a table name and returns a comprehensive profile a new team member can read without prior context.

Is Profiling Tables safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLdatabasesanalyticspipelines