
Harvard Artifacts Etl Analytics
Scaffold Harvard Art Museums API extraction, relational loading, SQL analytics, and a Streamlit dashboard for artifact collections.
Install
npx skills add https://github.com/aradotso/data-skills --skill harvard-artifacts-etl-analyticsWhat is this skill?
- Harvard Art Museums API extract with pagination through nested JSON normalization
- MySQL or TiDB Cloud batch load into relational artifact, media, and color tables
- Analytical SQL over metadata plus Plotly visualizations in Streamlit
- Documented architecture: API → ETL → SQL → Analytics → Visualization
Adoption & trust: 1 installs on skills.sh; 1 GitHub stars; 1/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Paper Context Resolverlllllllama/ai-paper-reproduction-skill
Repo Intake And Planlllllllama/ai-paper-reproduction-skill
Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill
Minimal Run And Auditlllllllama/ai-paper-reproduction-skill
Analyze Projectlllllllama/rigorpilot-skills
Ai Research Reproductionlllllllama/rigorpilot-skills
Journey fit
Primary fit
End-to-end ETL and analytics app construction is core Build work once you have an API and storage target chosen. Backend fits pipelines, batch loads, and analytical SQL more than frontend-only polish or agent-tooling meta.
Common Questions / FAQ
Is Harvard Artifacts Etl Analytics safe to install?
skills.sh reports 1 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Harvard Artifacts Etl Analytics
# Harvard Artifacts ETL Analytics Skill > Skill by [ara.so](https://ara.so) — Data Skills collection. This skill enables AI coding agents to build end-to-end data engineering and analytics applications using the Harvard Art Museums API. The project demonstrates ETL pipelines, SQL database design, analytical queries, and interactive Streamlit dashboards for museum artifact data. ## What This Project Does The Harvard Artifacts Collection application: - Extracts artifact data from the Harvard Art Museums API with pagination support - Transforms nested JSON into normalized relational tables - Loads data into MySQL/TiDB Cloud databases with batch inserts - Executes analytical SQL queries on artifact metadata, media, and color data - Visualizes results through interactive Plotly charts in a Streamlit interface **Architecture**: API → ETL → SQL → Analytics → Visualization ## Installation ```bash # Clone the repository git clone https://github.com/Manali0711/Harvard-Artifacts-Collection-Data-Engineering-Analytics-App.git cd Harvard-Artifacts-Collection-Data-Engineering-Analytics-App # Install dependencies pip install -r requirements.txt ``` **Requirements** (typical dependencies): ```txt streamlit pandas requests mysql-connector-python plotly python-dotenv ``` ## Configuration ### Environment Variables Create a `.env` file in the project root: ```env HARVARD_API_KEY=your_api_key_here MYSQL_HOST=your_mysql_host MYSQL_PORT=3306 MYSQL_USER=your_username MYSQL_PASSWORD=your_password MYSQL_DATABASE=harvard_artifacts ``` ### Harvard API Key Get your free API key from: https://docs.google.com/forms/d/e/1FAIpQLSfkmEBqH76HLMMiCC-GPPnhcvHC9aJS86E32dOd0Z1MWSWpl4/viewform ### Database Setup ```sql CREATE DATABASE harvard_artifacts; USE harvard_artifacts; -- Artifact metadata table CREATE TABLE artifactmetadata ( id INT PRIMARY KEY, title VARCHAR(500), culture VARCHAR(255), century VARCHAR(100), classification VARCHAR(255), department VARCHAR(255), dated VARCHAR(255), accession_number VARCHAR(100) ); -- Artifact media table CREATE TABLE artifactmedia ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, media_url TEXT, media_type VARCHAR(100), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); -- Artifact colors table CREATE TABLE artifactcolors ( id INT AUTO_INCREMENT PRIMARY KEY, artifact_id INT, color_hex VARCHAR(10), color_percent DECIMAL(5,2), FOREIGN KEY (artifact_id) REFERENCES artifactmetadata(id) ); ``` ## Running the Application ```bash streamlit run app.py ``` Access at `http://localhost:8501` ## Core ETL Pipeline Pattern ### Extract: Fetching API Data with Pagination ```python import requests import os from dotenv import load_dotenv load_dotenv() def fetch_artifacts(num_records=100): """Extract artifacts from Harvard Art Museums API with pagination.""" api_key = os.getenv('HARVARD_API_KEY') base_url = "https://api.harvardartmuseums.org/object" artifacts = [] page = 1 size = 100 # Max records per request while len(artifacts) < num_records: params = { 'apikey': api_key, 'size': size, 'page': page, 'hasimage': 1 # Only artifacts with images } response = requests.get(base_url, params=params)