
Harvard Art Museums Etl Analytics
Stand up a Harvard Art Museums API ETL into SQL and a Streamlit analytics dashboard without guessing schema, pagination, or query patterns.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-art-museums-etl-analyticsWhat is this skill?
- End-to-end Harvard Art Museums API integration with pagination and rate-limit-aware fetching
- Relational ETL into structured SQL tables for metadata, media, and color facets
- 20+ predefined analytics SQL queries for collection exploration
- Interactive Streamlit dashboards with Plotly visualizations on loaded warehouse tables
- Python-first workflow spanning extract, transform, load, and viz layers
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; 0/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
Canonical shelf is Build because the skill centers on implementing extraction, relational modeling, and load steps—the core product engineering work for a museum-data app. Backend fits best: API clients, SQL schemas (`artifactmetadata`, `artifactmedia`, `artifactcolors`), and pipeline orchestration are server-side data-engineering concerns rather than UI-only frontend work.
Common Questions / FAQ
Is Harvard Art Museums Etl Analytics safe to install?
skills.sh reports 0 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Harvard Art Museums Etl Analytics
# Harvard Art Museums ETL & Analytics > Skill by [ara.so](https://ara.so) — Data Skills collection. This skill enables AI agents to help developers build end-to-end ETL pipelines and analytics applications using the Harvard Art Museums API. The project demonstrates extracting artifact data, transforming it into relational structures, loading into SQL databases, and creating interactive Streamlit dashboards with Plotly visualizations. ## What This Project Does The Harvard Artifacts Collection Data Engineering Analytics App provides: - **API Integration**: Fetch artifact metadata from Harvard Art Museums API with pagination and rate limiting - **ETL Pipeline**: Extract, transform, and load artifact data into relational SQL tables - **Database Design**: Structured schema with `artifactmetadata`, `artifactmedia`, and `artifactcolors` tables - **Analytics Queries**: 20+ predefined SQL queries for artifact analysis - **Interactive Dashboards**: Streamlit-based UI with Plotly visualizations ## Installation ### Prerequisites ```bash # Python 3.8+ python --version # Install dependencies pip install streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ### Environment Setup Create a `.env` file: ```bash # Harvard Art Museums API HARVARD_API_KEY=your_api_key_here # Database Connection DB_HOST=your_database_host DB_PORT=3306 DB_USER=your_username DB_PASSWORD=your_password DB_NAME=harvard_artifacts ``` Get your API key from: https://harvardartmuseums.org/collections/api ### Database Setup ```sql -- Create database CREATE DATABASE harvard_artifacts; USE harvard_artifacts; -- Artifact metadata table CREATE TABLE artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), period VARCHAR(200), century VARCHAR(100), dated VARCHAR(200), classification VARCHAR(200), department VARCHAR(200), technique VARCHAR(500), medium VARCHAR(500), dimensions VARCHAR(500), creditline TEXT, division VARCHAR(200), contact VARCHAR(200), description TEXT, provenance TEXT, commentary TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Artifact media table CREATE TABLE artifactmedia ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, baseimageurl VARCHAR(500), primaryimageurl VARCHAR(500), iiifbaseuri VARCHAR(500), image_count INT, video_count INT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); -- Artifact colors table CREATE TABLE artifactcolors ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color_hex VARCHAR(10), color_name VARCHAR(100), color_percent DECIMAL(5,2), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); ``` ## Core Components ### 1. API Data Extraction ```python import requests import os from dotenv import load_dotenv load_dotenv() def fetch_artifacts(page=1, size=100): """Fetch artifacts from Harvard Art Museums API""" api_key = os.getenv('HARVARD_API_KEY') base_url = "https://api.harvardartmuseums.org/object" params = { 'apikey': api_key, 'page': page, 'size': size, 'hasimage': 1 # Only artifacts with images } response = requests.get(base_url, params=params) if response.status_code == 200: return response.json() else: raise Excep