
Harvard Artifacts Etl Streamlit
Pull Harvard Art Museums collection data through an ETL into SQL and explore it with Streamlit and Plotly dashboards.
Overview
Harvard Artifacts ETL & Streamlit is an agent skill most often used in Build (also Validate for prototypes and Grow for analytics views) that engineers Harvard Art Museums API data into SQL and Streamlit dashboards.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-etl-streamlitWhat is this skill?
- Harvard Art Museums API integration with pagination for large artifact pulls
- ETL flow: nested JSON extract, relational transform, SQL load (TiDB-oriented patterns)
- Normalized tables for artifacts, media, and color metadata
- 20+ predefined SQL analytical queries for collection exploration
- Streamlit + Plotly interactive dashboards on top of query results
- Architecture path: API → ETL → SQL → Analytics → Visualization
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You want museum collection analytics but only have raw API JSON and no relational schema, queries, or UI.
Who is it for?
Solo data engineers and indie builders prototyping collection analytics, portfolio apps, or SQL/Streamlit learning projects.
Skip if: Production deployments that need Harvard API key governance, SLAs, or zero-code BI without Python.
When should I use this skill?
Triggers include building Harvard Art Museums ETL, TiDB/SQL storage, or Streamlit visualization for collection data.
What do I get? / Deliverables
You end with a documented API→ETL→SQL→analytics→visualization stack you can extend or demo.
- ETL scripts for extract, transform, and load
- Relational schema for artifacts, media, and colors
- Streamlit dashboard with Plotly visualizations
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The skill’s core deliverable is a data pipeline and relational store—first shelf is Build → Backend data engineering. Backend subphase covers API ingestion, transformation, SQL loading, and analytics queries behind a dashboard.
Where it fits
Prove a weekend museum-search idea by pulling one paginated API slice into SQLite.
Implement normalized artifact, media, and color tables plus load jobs from nested JSON.
Wire Harvard API keys, rate limits, and pagination helpers into a repeatable extract step.
Ship Streamlit pages bound to the 20+ SQL queries for curators or investors.
How it compares
End-to-end reference pipeline skill—not a hosted Harvard dataset or generic Streamlit theme pack.
Common Questions / FAQ
Who is harvard-artifacts-etl-streamlit for?
Developers and data hobbyists building Python ETL and dashboard apps on top of the Harvard Art Museums open API.
When should I use harvard-artifacts-etl-streamlit?
In Validate when scoping a data-app prototype; in Build when implementing extract-transform-load and SQL models; in Grow when shipping an internal analytics dashboard for stakeholders.
Is harvard-artifacts-etl-streamlit safe to install?
It guides API and database access—review the Security Audits panel on this page, rotate Harvard API keys, and restrict network permissions to the museums API and your DB host only.
SKILL.md
READMESKILL.md - Harvard Artifacts Etl Streamlit
# Harvard Artifacts ETL & Analytics > Skill by [ara.so](https://ara.so) — Data Skills collection This skill enables you to build end-to-end data engineering pipelines using the Harvard Art Museums API. It covers data extraction, transformation, SQL storage, analytics queries, and interactive Streamlit dashboards. ## What This Project Does The Harvard-Artifacts-Collection-Data-Engineering-Analytics-App demonstrates: - **API Integration**: Fetch artifact data from Harvard Art Museums API with pagination - **ETL Pipeline**: Extract nested JSON, transform to relational format, load into SQL - **Database Design**: Store artifacts, media, and color data in normalized tables - **SQL Analytics**: Execute 20+ predefined analytical queries - **Interactive Dashboards**: Visualize query results with Plotly in Streamlit **Architecture**: `API → ETL → SQL → Analytics → Visualization` ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Set up environment variables export HARVARD_API_KEY="your_api_key_here" export DB_HOST="your_database_host" export DB_USER="your_db_user" export DB_PASSWORD="your_db_password" export DB_NAME="harvard_artifacts" ``` **Key Dependencies**: - `streamlit` - Web app framework - `pandas` - Data transformation - `requests` - API calls - `mysql-connector-python` or `pymysql` - Database connection - `plotly` - Visualizations ## Configuration ### API Setup Get your Harvard Art Museums API key from: https://www.harvardartmuseums.org/collections/api ```python import os import requests API_KEY = os.getenv('HARVARD_API_KEY') BASE_URL = "https://api.harvardartmuseums.org/object" def fetch_artifacts(page=1, size=100): """Fetch artifacts with pagination""" params = { 'apikey': API_KEY, 'page': page, 'size': size } response = requests.get(BASE_URL, params=params) response.raise_for_status() return response.json() ``` ### Database Connection ```python import mysql.connector from mysql.connector import Error def create_db_connection(): """Create MySQL/TiDB connection""" try: connection = mysql.connector.connect( host=os.getenv('DB_HOST'), user=os.getenv('DB_USER'), password=os.getenv('DB_PASSWORD'), database=os.getenv('DB_NAME') ) return connection except Error as e: print(f"Database connection error: {e}") return None ``` ### Table Schema ```sql -- Artifact metadata table CREATE TABLE IF NOT EXISTS artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), century VARCHAR(100), classification VARCHAR(200), department VARCHAR(200), period VARCHAR(200), technique VARCHAR(500), dated VARCHAR(100), url TEXT ); -- Artifact media table CREATE TABLE IF NOT EXISTS artifactmedia ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, media_type VARCHAR(100), baseimageurl TEXT, iiifbaseuri TEXT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); -- Artifact colors table CREATE TABLE IF NOT EXISTS artifactcolors ( id INT AUTO_INCREMENT PRIMARY KEY,