
Harvard Artifacts Data Engineering App
Scaffold a museum-API ETL project with SQL schemas, analytics queries, and a Streamlit dashboard for Harvard Art Museums collections data.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-data-engineering-appWhat is this skill?
- Integrates Harvard Art Museums API with pagination and rate limiting
- ETL pipeline flattens nested JSON into artifactmetadata, artifactmedia, and artifactcolors tables
- Includes 20+ prebuilt SQL analytical queries for collection exploration
- Streamlit interactive dashboard for browsing and visualizing artifact dimensions
- Demonstrates full extract-transform-load lifecycle suitable as a portfolio or internal analytics starter
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
The skill produces an end-to-end data app—ingestion, relational storage, and visualization—which is core Build backend and analytics engineering work. Backend is the canonical shelf because the heavy lift is ETL, schema design, and SQL analytics rather than marketing UI polish or agent routing.
Common Questions / FAQ
Is Harvard Artifacts Data Engineering App safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Harvard Artifacts Data Engineering App
# Harvard Artifacts Data Engineering App > Skill by [ara.so](https://ara.so) — Data Skills collection This skill enables AI coding agents to build and work with end-to-end data engineering applications using the Harvard Art Museums API. The project demonstrates real-world ETL pipelines, SQL database design, analytical queries, and interactive Streamlit dashboards for museum artifact data. ## What This Project Does The Harvard Artifacts Collection Data Engineering Analytics App provides: - **API Integration**: Collects artifact data from Harvard Art Museums API with pagination and rate limiting - **ETL Pipeline**: Extracts, transforms, and loads nested JSON artifact data into relational SQL tables - **Database Design**: Structured schema with `artifactmetadata`, `artifactmedia`, and `artifactcolors` tables - **SQL Analytics**: 20+ predefined analytical queries for insights on artifacts, cultures, centuries, and media - **Interactive Visualization**: Streamlit dashboard with Plotly charts for real-time data exploration ## Installation ### Prerequisites ```bash # Python 3.8+ python --version # Install dependencies pip install streamlit pandas requests plotly mysql-connector-python sqlalchemy ``` ### Environment Setup Create a `.env` file or set environment variables: ```bash # Harvard Art Museums API export HARVARD_API_KEY="your-api-key-here" # Database Configuration export DB_HOST="your-db-host" export DB_PORT="4000" export DB_USER="your-username" export DB_PASSWORD="your-password" export DB_NAME="harvard_artifacts" ``` ### Database Setup ```sql -- Create database CREATE DATABASE IF NOT EXISTS harvard_artifacts; -- Create artifactmetadata table CREATE TABLE artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), period VARCHAR(200), century VARCHAR(100), classification VARCHAR(200), department VARCHAR(200), division VARCHAR(200), dated VARCHAR(200), url TEXT, creditline TEXT, copyright TEXT, description TEXT, provenance TEXT, technique TEXT ); -- Create artifactmedia table CREATE TABLE artifactmedia ( media_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, baseimageurl VARCHAR(500), primaryimageurl VARCHAR(500), iiifbaseuri VARCHAR(500), total_images INT, has_images BOOLEAN, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); -- Create artifactcolors table CREATE TABLE artifactcolors ( color_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color VARCHAR(50), spectrum VARCHAR(50), hue VARCHAR(50), percent DECIMAL(5, 2), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); ``` ## Key Components and API ### 1. API Data Collection ```python import requests import os from typing import Dict, List class HarvardAPICollector: """Collect artifact data from Harvard Art Museums API""" def __init__(self, api_key: str = None): self.api_key = api_key or os.getenv('HARVARD_API_KEY') self.base_url = "https://api.harvardartmuseums.org/object" def fetch_artifacts(self, page: int = 1, size: int = 100) -> Dict: """Fetch artifacts with pagination""" params = { 'apikey': self.api_key, 'page': page, 'size': size, 'hasimage': 1