
Data Quality Frameworks
Define Great Expectations suites and column-level checks so pipelines and warehouse tables fail fast before bad data reaches users.
Overview
data-quality-frameworks is an agent skill most often used in Ship (also Build) that templates Great Expectations suites for schema, keys, and categorical data quality rules.
Install
npx skills add https://github.com/wshobson/agents --skill data-quality-frameworksWhat is this skill?
- Great Expectations ExpectationSuite builder for orders-style fact tables
- Schema checks via expect_table_columns_to_match_set with flexible extra columns
- Primary-key rules: not-null and unique on order_id
- Foreign-key style not-null on customer_id
- Categorical integrity with expect_column_values_to_be_in_set for status fields
- Multi-expectation orders_suite pattern (schema, PK, FK, categorical set checks)
Adoption & trust: 7.2k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your pipeline ships broken orders or dimension tables because null keys and invalid statuses slip past manual SQL checks.
Who is it for?
Indie products with warehouse or Postgres analytics tables who want GE-style declarative expectations in version control.
Skip if: Real-time stream-only systems with no batch tables, or teams standardized solely on dbt tests with no Great Expectations footprint.
When should I use this skill?
When defining or extending data quality checks, Great Expectations suites, or pipeline validation for tabular datasets.
What do I get? / Deliverables
You get a documented expectation suite agents can extend, so validation runs in CI or checkpoints before downstream models consume the data.
- ExpectationSuite definition module
- Documented column and set constraints
- Reusable pattern for additional tables
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Data quality gates belong on the shipping shelf as automated QA, while pipeline authors still touch them during build. Expectation suites, uniqueness, and categorical sets are testing artifacts that validate datasets like application test suites.
Where it fits
Add an orders_suite module when creating a new fact table export from your API database.
Run GE validation in CI gates before promoting warehouse builds to production.
Refresh categorical set expectations when product adds new order status values.
How it compares
Declarative data-test templates in code, not a hosted observability product or one-off spreadsheet audits.
Common Questions / FAQ
Who is data-quality-frameworks for?
Solo builders and small teams running Python data pipelines who need structured table validation with Great Expectations.
When should I use data-quality-frameworks?
During ship/testing before releases that depend on analytics tables, and during build/backend when authoring new ETL outputs or marts.
Is data-quality-frameworks safe to install?
It is documentation and example code only; review the Security Audits panel on this page and run expectations against non-production data first.
SKILL.md
READMESKILL.md - Data Quality Frameworks
# data-quality-frameworks — detailed patterns and worked examples ## Patterns ### Pattern 1: Great Expectations Suite ```python # expectations/orders_suite.py import great_expectations as gx from great_expectations.core import ExpectationSuite from great_expectations.core.expectation_configuration import ExpectationConfiguration def build_orders_suite() -> ExpectationSuite: """Build comprehensive orders expectation suite""" suite = ExpectationSuite(expectation_suite_name="orders_suite") # Schema expectations suite.add_expectation(ExpectationConfiguration( expectation_type="expect_table_columns_to_match_set", kwargs={ "column_set": ["order_id", "customer_id", "amount", "status", "created_at"], "exact_match": False # Allow additional columns } )) # Primary key suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_not_be_null", kwargs={"column": "order_id"} )) suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_be_unique", kwargs={"column": "order_id"} )) # Foreign key suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_not_be_null", kwargs={"column": "customer_id"} )) # Categorical values suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_be_in_set", kwargs={ "column": "status", "value_set": ["pending", "processing", "shipped", "delivered", "cancelled"] } )) # Numeric ranges suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_be_between", kwargs={ "column": "amount", "min_value": 0, "max_value": 100000, "strict_min": True # amount > 0 } )) # Date validity suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_values_to_be_dateutil_parseable", kwargs={"column": "created_at"} )) # Freshness - data should be recent suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_max_to_be_between", kwargs={ "column": "created_at", "min_value": {"$PARAMETER": "now - timedelta(days=1)"}, "max_value": {"$PARAMETER": "now"} } )) # Row count sanity suite.add_expectation(ExpectationConfiguration( expectation_type="expect_table_row_count_to_be_between", kwargs={ "min_value": 1000, # Expect at least 1000 rows "max_value": 10000000 } )) # Statistical expectations suite.add_expectation(ExpectationConfiguration( expectation_type="expect_column_mean_to_be_between", kwargs={ "column": "amount", "min_value": 50, "max_value": 500 } )) return suite ``` ### Pattern 2: Great Expectations Checkpoint ```yaml # great_expectations/checkpoints/orders_checkpoint.yml name: orders_checkpoint config_version: 1.0 class_name: Checkpoint run_name_template: "%Y%m%d-%H%M%S-orders-validation" validations: - batch_request: datasource_name: warehouse data_connector_name: default_inferred_data_connector_name data_asset_name: orders data_connector_query: index: -1 # Latest batch expectation_suite_name: orders_suite action_list: - name: store_validation_result action: class_name: StoreValidationResultAction - name: store_evaluation_parameters action: class_name: StoreEvaluationParametersAction - name: update_data_docs action: class_name: UpdateDataDocsAction # Slack notification on failure - name: send_slack_notification action: class_name: SlackNotificationAction slack_webhook: ${SLACK_WEBHOOK} notify_o