
Harvard Artifacts Etl Pipeline
Stand up a Harvard Art Museums API ETL into MySQL with SQL analytics and a Streamlit/Plotly dashboard for exploring artifact collections.
Overview
Harvard-artifacts-etl-pipeline is an agent skill most often used in Build (also Grow) that builds Harvard Art Museums API ETL into MySQL with SQL analytics and Streamlit visualization.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-etl-pipelineWhat is this skill?
- Fetches Harvard Art Museums collection data with pagination and API rate limiting
- ETL pipeline for artifact metadata, media, and color fields into MySQL
- Predefined SQL analytics layer on structured relational tables
- Streamlit dashboards with Plotly for interactive artifact exploration
- End-to-end flow: API → ETL → SQL → Analytics → Visualization
- Architecture chain: API → ETL → SQL → Analytics → Visualization (five layers).
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You want Harvard Art Museums collection data in a queryable database and dashboards, but wiring pagination, transforms, and viz by ad-hoc scripts is slow and brittle.
Who is it for?
Indie data builders prototyping collection analytics, internal research tools, or portfolio data-engineering projects against the Harvard Art Museums API.
Skip if: Teams that only need a few REST calls without storage, or builders who cannot run Python, MySQL, and a local Streamlit app.
When should I use this skill?
User asks to build an ETL pipeline for Harvard Art Museums data, create artifact analytics dashboards, extract/transform Harvard API data, set up artifact data engineering, query the collection database, visualize museum
What do I get? / Deliverables
You get a documented API → ETL → MySQL → SQL analytics → Streamlit/Plotly pipeline you can extend with new queries and dashboard views.
- ETL jobs loading Harvard artifact metadata, media, and color data into MySQL
- SQL analytics queries over the loaded schema
- Streamlit dashboard with Plotly charts for interactive exploration
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The canonical shelf is Build because the skill’s core deliverable is implementing the API → ETL → SQL → visualization stack—not one-off queries or post-launch ops alone. Integrations fits best: paginated API extraction, rate limiting, transforms, and relational loads are the heart of the workflow before analytics UI work.
Where it fits
Wire Harvard API pagination and rate limits into extract jobs that land normalized artifact rows in MySQL.
Define transform rules and load steps for metadata, media links, and color fields before exposing SQL-friendly tables.
Ship Streamlit pages backed by predefined SQL to compare periods, mediums, or color distributions in the collection.
Prototype a small-batch Harvard sync and a single dashboard chart to prove the analytics story before scaling the full pipeline.
How it compares
Use this structured ETL-and-dashboard workflow instead of one-off Harvard API scripts with no schema, SQL layer, or interactive analytics UI.
Common Questions / FAQ
Who is harvard-artifacts-etl-pipeline for?
Solo and indie builders using AI coding agents who want end-to-end Harvard Art Museums data engineering—API extract, MySQL load, SQL analytics, and Streamlit dashboards—in one guided implementation.
When should I use harvard-artifacts-etl-pipeline?
Use it when you need to build an ETL pipeline for Harvard Art Museums data, create an artifact analytics dashboard, extract and transform Harvard API responses, set up a collection query database, visualize museum analytics, or analyze artifacts with SQL after load—typically whil
Is harvard-artifacts-etl-pipeline safe to install?
Review the Security Audits panel on this Prism page for ingested audit signals and risk context; the skill implies network access to external APIs and local database/filesystem work, so scope credentials and outbound calls in your agent environment before running.
SKILL.md
READMESKILL.md - Harvard Artifacts Etl Pipeline
# Harvard Artifacts ETL Pipeline > Skill by [ara.so](https://ara.so) — Data Skills collection This skill enables AI coding agents to build end-to-end data engineering pipelines using the Harvard Art Museums API. The project demonstrates ETL operations, SQL analytics, and interactive Streamlit dashboards for artifact collection data. ## What This Project Does The Harvard Artifacts Collection Data Engineering & Analytics App provides: - **API Integration**: Fetch artifact data from Harvard Art Museums API with pagination and rate limiting - **ETL Pipeline**: Extract, transform, and load artifact metadata, media, and color data into relational databases - **SQL Analytics**: Execute predefined analytical queries on structured artifact data - **Interactive Visualization**: Streamlit dashboards with Plotly charts for data exploration The architecture follows: **API → ETL → SQL → Analytics → Visualization** ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Required packages pip install streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ## Configuration ### Environment Variables Create a `.env` file in the project root: ```bash # Harvard Art Museums API HARVARD_API_KEY=your_api_key_here # MySQL/TiDB Database Connection DB_HOST=your_database_host DB_PORT=3306 DB_USER=your_database_user DB_PASSWORD=your_database_password DB_NAME=harvard_artifacts ``` ### Database Setup ```python import mysql.connector from mysql.connector import Error def create_database_schema(): """Initialize database schema for artifact storage""" connection = mysql.connector.connect( host=os.getenv('DB_HOST'), user=os.getenv('DB_USER'), password=os.getenv('DB_PASSWORD') ) cursor = connection.cursor() # Create database cursor.execute(f"CREATE DATABASE IF NOT EXISTS {os.getenv('DB_NAME')}") cursor.execute(f"USE {os.getenv('DB_NAME')}") # Artifact metadata table cursor.execute(""" CREATE TABLE IF NOT EXISTS artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), century VARCHAR(100), classification VARCHAR(200), department VARCHAR(200), dated VARCHAR(200), period VARCHAR(200), technique VARCHAR(500), medium VARCHAR(500), dimensions VARCHAR(500), division VARCHAR(200), verificationlevel INT, totalpageviews INT, totaluniquepageviews INT ) """) # Artifact media table cursor.execute(""" CREATE TABLE IF NOT EXISTS artifactmedia ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, imageid INT, baseimageurl VARCHAR(1000), format VARCHAR(50), height INT, width INT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ) """) # Artifact colors table cursor.execute(""" CREATE TABLE IF NOT EXISTS artifactcolors ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color VAR