
Harvard Artifacts Etl Streamlit Analytics
Spin up a Harvard Art Museums API → SQL → Streamlit analytics stack to practice museum-metadata ETL and interactive collection dashboards.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-etl-streamlit-analyticsWhat is this skill?
- End-to-end flow: Harvard Art Museums API → ETL → SQL → queries → Streamlit charts
- Covers artifact metadata, media references, and color data in a relational model
- Includes clone-and-install steps and requirements-driven Python setup
- Demonstrates real-world museum/cultural-heritage collection analytics patterns
- Interactive Streamlit layer for exploring loaded collection data
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; trending (+100% hot-view momentum).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
Canonical shelf is Build because the skill delivers a full extract-transform-load implementation, relational storage, and an analytics UI—not launch or growth work. Backend fits best: API ingestion, transformation rules, SQL loading, and analytical queries are core server-side data engineering.
SKILL.md
READMESKILL.md - Harvard Artifacts Etl Streamlit Analytics
# Harvard Artifacts ETL & Analytics Application > Skill by [ara.so](https://ara.so) — Data Skills collection. ## Overview This project is a complete data engineering solution that demonstrates real-world ETL (Extract, Transform, Load) patterns using the Harvard Art Museums API. It extracts artifact metadata, media files, and color information, loads them into a relational SQL database, performs analytical queries, and visualizes results through an interactive Streamlit dashboard. **Architecture Flow**: API → ETL → SQL Database → Analytics Queries → Streamlit Visualization ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt # Run the Streamlit app streamlit run app.py ``` ### Dependencies ```txt streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ## Configuration ### API Key Setup Register for a free API key at [Harvard Art Museums API](https://www.harvardartmuseums.org/collections/api). ```python # Use environment variables for secure configuration import os from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv('HARVARD_API_KEY') API_BASE_URL = 'https://api.harvardartmuseums.org' ``` ### Database Configuration ```python import mysql.connector db_config = { 'host': os.getenv('DB_HOST'), 'user': os.getenv('DB_USER'), 'password': os.getenv('DB_PASSWORD'), 'database': os.getenv('DB_NAME'), 'port': int(os.getenv('DB_PORT', 3306)) } conn = mysql.connector.connect(**db_config) ``` ## Database Schema The project uses three main tables with relational design: ```sql -- Artifact metadata table CREATE TABLE artifactmetadata ( artifact_id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(200), classification VARCHAR(200), department VARCHAR(200), dated VARCHAR(200), century VARCHAR(200), technique VARCHAR(500), medium VARCHAR(500), dimensions VARCHAR(500), description TEXT, provenance TEXT, url VARCHAR(500) ); -- Artifact media table CREATE TABLE artifactmedia ( media_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, baseimageurl VARCHAR(500), renditionnumber VARCHAR(100), format VARCHAR(50), height INT, width INT, FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(artifact_id) ); -- Artifact colors table CREATE TABLE artifactcolors ( color_id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color VARCHAR(50), spectrum VARCHAR(50), hue VARCHAR(50), percent FLOAT, css3 VARCHAR(50), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(artifact_id) ); ``` ## Core ETL Patterns ### Extract: Fetch Data from API ```python import requests import pandas as pd def fetch_artifacts(api_key, size=100, page=1): """ Fetch artifacts from Harvard Art Museums API with pagination """ url = f'https://api.harvardartmuseums.org/object' params = { 'apikey': api_key, 'size': size, 'page': page, 'hasimage': 1 # Only artifacts with images } response = requests.get(url, params=params) if response.status_code == 200: data = response.json()