
Literature Search
Pull arXiv paper source and extracted .tex locally so your coding agent can cite and reason over real LaTeX instead of scraping abstracts by hand.
Overview
Literature-search is an agent skill most often used in Idea (also Validate and Build) that searches arXiv by title or ID, downloads source tarballs, and extracts .tex for local agent consumption.
Install
npx skills add https://github.com/lingzhi227/agent-research-skills --skill literature-searchWhat is this skill?
- Searches the arXiv export API by title (field ti) or direct arXiv ID with relevance sorting
- Downloads official source tarballs and extracts .tex into a configurable output directory
- Self-contained Python 3 script using only stdlib (urllib + xml.etree.ElementTree)
- CLI supports disambiguation via --max-results when titles collide
- Three entry paths: --title, --title with --max-results, and --arxiv-id
- Self-contained stdlib-only implementation (urllib + xml.etree, no feedparser)
- Three documented CLI entry patterns: --title, --title with --max-results, and --arxiv-id
Adoption & trust: 1.1k installs on skills.sh; 113 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a paper name or arXiv ID but no local LaTeX source for your agent to summarize, compare, or implement from.
Who is it for?
Solo builders doing technical literature review on arXiv who want stdlib-only automation before committing to an architecture or agent prompt chain.
Skip if: Builders who need paywalled journals, full Zotero workflows, or PDF-only corpora outside arXiv’s open source bundles.
When should I use this skill?
When you need arXiv paper source or extracted .tex from a known title or arXiv ID for agent research, validation, or implementation grounding.
What do I get? / Deliverables
You end up with extracted .tex files in your chosen output directory, ready for the next research or implementation step in your agent session.
- Local directory containing extracted .tex from matched arXiv source
- Console or JSON metadata from arXiv title/ID search (per script invocation)
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Literature discovery on arXiv is the canonical first move when a solo builder is still choosing what to build or which technique to adopt, before implementation commitments. Title and ID search against arXiv is primary-source research—not competitor scraping or audience interviews—so the natural shelf is idea/research.
Where it fits
Download the .tex for seminal transformer papers before choosing a model stack for your SaaS feature.
Run a title search with max-results to see which arXiv entries match a vague technique keyword you heard on social media.
Pull primary-source LaTeX for papers you plan to cite in a one-page validation doc for investors or co-builders.
Feed extracted .tex into an implementation agent that must mirror equations or architecture from a specific arXiv preprint.
Ground a technical blog draft in the exact notation from downloaded source rather than error-prone PDF quotes.
How it compares
Use instead of manual arXiv clicking and ad-hoc copy-paste when you specifically need source .tex for agent context—not a generic web-search MCP.
Common Questions / FAQ
Who is literature-search for?
Solo and indie builders using Claude Code, Cursor, or Codex who ground decisions in arXiv papers and want local LaTeX without maintaining a heavy reference stack.
When should I use literature-search?
During Idea research to find foundational papers, during Validate scope when you need primary sources for a memo, and during Build agent-tooling when an agent must read the actual .tex from a named preprint.
Is literature-search safe to install?
Review the Security Audits panel on this Prism page for this skill’s package hash and audit status; the skill runs a network client to arXiv and writes files to disk, so use an output directory you control and respect arXiv’s terms and rate limits.
SKILL.md
READMESKILL.md - Literature Search
#!/usr/bin/env python3 """Download arXiv paper source by title, extract .tex content. Searches the arXiv API by title, downloads the source tarball, and extracts .tex files into a local directory. Self-contained: uses only stdlib (urllib + xml.etree instead of feedparser). Usage: python download_arxiv_source.py --title "Attention Is All You Need" --output-dir arxiv_papers/ python download_arxiv_source.py --title "BERT" --max-results 3 --output-dir arxiv_papers/ python download_arxiv_source.py --arxiv-id 1706.03762 --output-dir arxiv_papers/ """ import argparse import json import os import re import sys import tarfile import tempfile import time import urllib.parse import urllib.request import xml.etree.ElementTree as ET ARXIV_API = "http://export.arxiv.org/api/query" ARXIV_NS = {"atom": "http://www.w3.org/2005/Atom"} def search_arxiv(query: str, max_results: int = 5, search_field: str = "ti") -> list[dict]: """Search arXiv API and return paper metadata.""" params = urllib.parse.urlencode({ "search_query": f"{search_field}:{urllib.parse.quote(query)}", "start": 0, "max_results": max_results, "sortBy": "relevance", "sortOrder": "descending", }) url = f"{ARXIV_API}?{params}" try: with urllib.request.urlopen(url, timeout=30) as resp: xml_data = resp.read() except Exception as e: print(f"arXiv API error: {e}", file=sys.stderr) return [] root = ET.fromstring(xml_data) papers = [] for entry in root.findall("atom:entry", ARXIV_NS): title_el = entry.find("atom:title", ARXIV_NS) title = title_el.text.strip().replace("\n", " ") if title_el is not None else "" summary_el = entry.find("atom:summary", ARXIV_NS) summary = summary_el.text.strip() if summary_el is not None else "" authors = [] for author in entry.findall("atom:author", ARXIV_NS): name_el = author.find("atom:name", ARXIV_NS) if name_el is not None: authors.append(name_el.text) published_el = entry.find("atom:published", ARXIV_NS) published = published_el.text if published_el is not None else "" abs_link = "" pdf_link = "" for link in entry.findall("atom:link", ARXIV_NS): href = link.get("href", "") link_type = link.get("type", "") rel = link.get("rel", "") if link_type == "application/pdf": pdf_link = href elif rel == "alternate": abs_link = href arxiv_id = "" id_el = entry.find("atom:id", ARXIV_NS) if id_el is not None: m = re.search(r"abs/(.+)", id_el.text) if m: arxiv_id = m.group(1) papers.append({ "title": title, "authors": authors, "published": published, "summary": summary, "abs_link": abs_link, "pdf_link": pdf_link, "arxiv_id": arxiv_id, }) return papers def download_source(arxiv_id: str, output_dir: str) -> str | None: """Download arXiv source tarball and extract .tex files. Returns the path to the extracted content or None on failure. """ # Strip version suffix for source download base_id = re.sub(r"v\d+$", "", arxiv_id) source_url = f"https://arxiv.org/src/{base_id}" safe_name = re.sub(r"[^a-zA-Z0-9._-]", "_", arxiv_id) os.makedirs(output_dir, exist_ok=True) try: req = urllib.request.Request(source_url, headers={"User-Agent": "SkillScript/1.0"}) with urllib.request.urlopen(req, timeout=60) as resp: tar_data = resp.read() except Exception as e: print(f"Download failed for {arxiv_id}: {e}", file=sys.stderr) return None # Save tarball to temp file, then extract with tempfile.NamedTemporaryFile(suffix=".tar.gz", delete=False) as tmp: tmp.write(tar_d