
Harvard Artifacts Collection Analytics App
Stand up an end-to-end Harvard Art Museums ETL into SQL plus a Streamlit analytics dashboard for a portfolio or internal data app.
Overview
Harvard Artifacts Collection Analytics App is an agent skill for the Build phase that builds ETL from the Harvard Art Museums API into SQL and a Streamlit analytics dashboard.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-collection-analytics-appWhat is this skill?
- Extracts artifact metadata, media, and color data from the Harvard Art Museums API
- Transforms nested JSON into normalized relational tables
- Loads into MySQL or TiDB Cloud with pagination and rate-limit handling
- Runs analytical SQL and renders interactive Plotly charts in Streamlit
- End-to-end pattern: API → transform → DB → dashboard
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You want a real data-engineering portfolio piece but do not know how to paginate the Harvard API, normalize museum JSON, and wire SQL to Streamlit.
Who is it for?
Solo builders learning or showcasing ETL plus SQL analytics with a well-documented public museum API.
Skip if: Products that need proprietary data sources only, or teams wanting managed Fivetran-style sync without writing Python pipelines.
When should I use this skill?
User asks to build ETL, SQL analytics, or a Streamlit dashboard for Harvard Art Museums artifact data (see SKILL.md triggers list).
What do I get? / Deliverables
You implement a repeatable extract-transform-load path into MySQL/TiDB and ship an interactive dashboard over curated artifact analytics queries.
- ETL scripts loading normalized artifact tables
- Analytical SQL layer
- Streamlit app with Plotly interactive charts
Recommended Skills
Journey fit
Build → backend is the primary shelf because the skill centers on API extraction, relational modeling, and database load—not distribution or ops monitoring. Backend subphase matches Python ETL jobs, normalized tables, and analytical SQL against TiDB/MySQL rather than UI-only prototyping.
How it compares
Full-stack data sample app skill—not a single SQL snippet or a generic “call any REST API” integration.
Common Questions / FAQ
Who is harvard-artifacts-collection-analytics-app for?
Developers and indie data builders who want Python ETL, relational modeling, and Streamlit visualization using Harvard Art Museums open data.
When should I use harvard-artifacts-collection-analytics-app?
During Build when you are implementing backend pipelines, loading TiDB/MySQL, and standing up an analytics UI for collection insights.
Is harvard-artifacts-collection-analytics-app safe to install?
Review the Security Audits panel on this Prism page; the workflow uses network access to external APIs and database credentials you must scope yourself.
SKILL.md
READMESKILL.md - Harvard Artifacts Collection Analytics App
# Harvard Artifacts Collection Analytics App > Skill by [ara.so](https://ara.so) — Data Skills collection ## What This Project Does The Harvard-Artifacts-Collection-Data-Engineering-Analytics-App is an end-to-end data engineering application that demonstrates real-world ETL pipelines, SQL analytics, and interactive visualization. It extracts artifact data from the Harvard Art Museums API, transforms it into relational database structures, and provides interactive analytics dashboards using Streamlit. **Key capabilities:** - Extract artifact metadata, media, and color data from Harvard Art Museums API - Transform nested JSON into normalized relational tables - Load data into MySQL/TiDB Cloud databases - Execute analytical SQL queries - Visualize results with interactive Plotly charts - Handle API pagination and rate limiting ## Installation ### Prerequisites ```bash # Python 3.7+ python --version # Install required packages pip install streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ### Project Setup ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Create .env file for configuration touch .env ``` ### Environment Configuration Create a `.env` file with the following variables: ```bash # Harvard Art Museums API HARVARD_API_KEY=your_api_key_here # Database Configuration (MySQL/TiDB Cloud) DB_HOST=your_database_host DB_PORT=3306 DB_USER=your_database_user DB_PASSWORD=your_database_password DB_NAME=harvard_artifacts ``` To get a Harvard Art Museums API key: 1. Visit https://www.harvardartmuseums.org/collections/api 2. Request an API key (free) 3. Add it to your `.env` file ## Running the Application ```bash # Start the Streamlit app streamlit run app.py # The app will open in your browser at http://localhost:8501 ``` ## Database Schema The application creates three main tables: ```sql -- Artifact metadata table CREATE TABLE artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(255), century VARCHAR(255), dated VARCHAR(255), classification VARCHAR(255), department VARCHAR(255), technique VARCHAR(500), medium VARCHAR(500), dimensions VARCHAR(500), creditline TEXT, accession_number VARCHAR(255), url TEXT ); -- Artifact media table CREATE TABLE artifactmedia ( media_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, image_url TEXT, media_type VARCHAR(100), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); -- Artifact colors table CREATE TABLE artifactcolors ( color_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color VARCHAR(50), percentage FLOAT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); ``` ## Code Examples ### 1. API Data Extraction ```python import requests import os from dotenv import load_dotenv load_dotenv() def fetch_artifacts(api_key, page=1, size=100): """ Fetch artifacts from Harvard Art Museums API Args: api_key: Harvard API key page: Page number for pagination size: Number of records per page (max 100) Returns: JSON respons