
Harvard Artifacts Collection Etl Analytics
Stand up a Harvard Art Museums API → SQL → Streamlit analytics stack without guessing schema or query patterns.
Overview
Harvard Artifacts Collection ETL Analytics is an agent skill for the Build phase that builds Harvard Art Museums API ETL, SQL analytics, and Streamlit dashboards for artifact collections.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-collection-etl-analyticsWhat is this skill?
- Harvard Art Museums API integration with pagination and rate limiting
- ETL from nested JSON into artifactmetadata, artifactmedia, and artifactcolors tables
- 20+ predefined SQL analytical queries for collection insights
- Interactive Streamlit dashboards for museum artifact visualization
- End-to-end Python pipeline pattern from ara.so Data Skills
- 20+ predefined analytical SQL queries
- Three relational tables: artifactmetadata, artifactmedia, artifactcolors
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
What problem does it solve?
You want museum collection analytics but only have raw API JSON and no relational schema, queries, or dashboard scaffold.
Who is it for?
Solo builders shipping a portfolio-grade data project or internal analytics tool on public museum APIs with Python and SQL.
Skip if: Teams that only need a one-off curl fetch with no database, or production compliance-heavy cultural-heritity systems without your own governance review.
When should I use this skill?
build an ETL pipeline for Harvard Art Museums data; create analytics dashboards with museum artifacts; extract/transform Harvard museum API data; query and visualize Harvard artifacts
What do I get? / Deliverables
You get a working extract-load pipeline, structured SQL tables, 20+ ready analytics queries, and Streamlit visualizations over Harvard artifact data.
- ETL scripts loading API data into structured SQL tables
- Analytics query set and Streamlit dashboard for artifact exploration
Recommended Skills
Journey fit
Canonical shelf is Build because the skill delivers extraction, relational modeling, and dashboard code—not launch SEO or production ops. Backend fits ETL, SQL table design, and analytical queries; API fetch and load are core backend/data-engineering work.
How it compares
Use as a full pipeline blueprint instead of piecing together generic Python ETL snippets with no museum-specific schema.
Common Questions / FAQ
Who is harvard-artifacts-collection-etl-analytics for?
Indie and solo developers learning or demonstrating data engineering with a well-documented public arts API, SQL, and Streamlit.
When should I use harvard-artifacts-collection-etl-analytics?
During Build when you are implementing API extraction, relational models, and dashboards for Harvard Art Museums—or similar nested JSON collection sources.
Is harvard-artifacts-collection-etl-analytics safe to install?
Review the Security Audits panel on this Prism page before installing; the skill uses network API access and local Python/SQL execution you should run in a trusted environment.
SKILL.md
READMESKILL.md - Harvard Artifacts Collection Etl Analytics
# Harvard Artifacts Collection ETL Analytics > Skill by [ara.so](https://ara.so) — Data Skills collection This skill enables you to build end-to-end data engineering and analytics applications using the Harvard Art Museums API. It demonstrates real-world ETL pipelines, SQL database design, analytical queries, and interactive data visualization using Streamlit. ## What It Does The Harvard Artifacts Collection ETL Analytics project provides: - **API Integration**: Fetch artifact data from Harvard Art Museums API with pagination and rate limiting - **ETL Pipeline**: Extract, transform, and load nested JSON data into relational SQL tables - **Database Design**: Structured schema with `artifactmetadata`, `artifactmedia`, and `artifactcolors` tables - **SQL Analytics**: 20+ predefined analytical queries for insights - **Interactive Dashboards**: Streamlit-based UI with Plotly visualizations ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Required packages pip install streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ## Configuration ### Environment Variables Create a `.env` file in the project root: ```env HARVARD_API_KEY=your_api_key_here DB_HOST=your_database_host DB_PORT=3306 DB_USER=your_db_user DB_PASSWORD=your_db_password DB_NAME=harvard_artifacts ``` ### Get Harvard API Key 1. Visit [Harvard Art Museums API](https://www.harvardartmuseums.org/collections/api) 2. Register for a free API key 3. Add the key to your `.env` file ### Database Setup Create the required database schema: ```sql CREATE DATABASE IF NOT EXISTS harvard_artifacts; USE harvard_artifacts; CREATE TABLE artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), period VARCHAR(200), century VARCHAR(100), classification VARCHAR(200), department VARCHAR(200), technique VARCHAR(500), dated VARCHAR(200), url VARCHAR(500), description TEXT ); CREATE TABLE artifactmedia ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, media_type VARCHAR(100), base_image_url VARCHAR(500), has_image BOOLEAN, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); CREATE TABLE artifactcolors ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color_hex VARCHAR(10), color_name VARCHAR(100), percentage FLOAT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); ``` ## Core API Integration ### Fetching Artifact Data ```python import requests import os from dotenv import load_dotenv load_dotenv() def fetch_artifacts(page=1, size=100): """Fetch artifacts from Harvard Art Museums API""" api_key = os.getenv('HARVARD_API_KEY') base_url = "https://api.harvardartmuseums.org/object" params = { 'apikey': api_key, 'page': page, 'size': size, 'hasimage': 1 # Only fetch artifacts with images } response = requests.get(base_url, params=params) if response.status_code == 200: return response.json() else: raise Exception(f"API request failed: {response.status_code}") # Fetch first page