Literature Search

Literature discovery on arXiv is the canonical first move when a solo builder is still choosing what to build or which technique to adopt, before implementation commitments. Title and ID search against arXiv is primary-source research—not competitor scraping or audience interviews—so the natural shelf is idea/research.

Also useful

Also useful

Where it fits

Example use

Download the .tex for seminal transformer papers before choosing a model stack for your SaaS feature.

Example use

IdeaFind the right tools

Run a title search with max-results to see which arXiv entries match a vague technique keyword you heard on social media.

Example use

Pull primary-source LaTeX for papers you plan to cite in a one-page validation doc for investors or co-builders.

Example use

Feed extracted .tex into an implementation agent that must mirror equations or architecture from a specific arXiv preprint.

Example use

GrowContent & marketing

Ground a technical blog draft in the exact notation from downloaded source rather than error-prone PDF quotes.

How it compares

Use instead of manual arXiv clicking and ad-hoc copy-paste when you specifically need source .tex for agent context—not a generic web-search MCP.

Common Questions / FAQ

Who is literature-search for?

Solo and indie builders using Claude Code, Cursor, or Codex who ground decisions in arXiv papers and want local LaTeX without maintaining a heavy reference stack.

When should I use literature-search?

During Idea research to find foundational papers, during Validate scope when you need primary sources for a memo, and during Build agent-tooling when an agent must read the actual .tex from a named preprint.

Is literature-search safe to install?

Review the Security Audits panel on this Prism page for this skill’s package hash and audit status; the skill runs a network client to arXiv and writes files to disk, so use an output directory you control and respect arXiv’s terms and rate limits.

SKILL.md

READMESKILL.md - Literature Search

#!/usr/bin/env python3
"""Download arXiv paper source by title, extract .tex content.

Searches the arXiv API by title, downloads the source tarball,
and extracts .tex files into a local directory.

Self-contained: uses only stdlib (urllib + xml.etree instead of feedparser).

Usage:
    python download_arxiv_source.py --title "Attention Is All You Need" --output-dir arxiv_papers/
    python download_arxiv_source.py --title "BERT" --max-results 3 --output-dir arxiv_papers/
    python download_arxiv_source.py --arxiv-id 1706.03762 --output-dir arxiv_papers/
"""

import argparse
import json
import os
import re
import sys
import tarfile
import tempfile
import time
import urllib.parse
import urllib.request
import xml.etree.ElementTree as ET

ARXIV_API = "http://export.arxiv.org/api/query"
ARXIV_NS = {"atom": "http://www.w3.org/2005/Atom"}


def search_arxiv(query: str, max_results: int = 5, search_field: str = "ti") -> list[dict]:
    """Search arXiv API and return paper metadata."""
    params = urllib.parse.urlencode({
        "search_query": f"{search_field}:{urllib.parse.quote(query)}",
        "start": 0,
        "max_results": max_results,
        "sortBy": "relevance",
        "sortOrder": "descending",
    })
    url = f"{ARXIV_API}?{params}"

    try:
        with urllib.request.urlopen(url, timeout=30) as resp:
            xml_data = resp.read()
    except Exception as e:
        print(f"arXiv API error: {e}", file=sys.stderr)
        return []

    root = ET.fromstring(xml_data)
    papers = []
    for entry in root.findall("atom:entry", ARXIV_NS):
        title_el = entry.find("atom:title", ARXIV_NS)
        title = title_el.text.strip().replace("\n", " ") if title_el is not None else ""

        summary_el = entry.find("atom:summary", ARXIV_NS)
        summary = summary_el.text.strip() if summary_el is not None else ""

        authors = []
        for author in entry.findall("atom:author", ARXIV_NS):
            name_el = author.find("atom:name", ARXIV_NS)
            if name_el is not None:
                authors.append(name_el.text)

        published_el = entry.find("atom:published", ARXIV_NS)
        published = published_el.text if published_el is not None else ""

        abs_link = ""
        pdf_link = ""
        for link in entry.findall("atom:link", ARXIV_NS):
            href = link.get("href", "")
            link_type = link.get("type", "")
            rel = link.get("rel", "")
            if link_type == "application/pdf":
                pdf_link = href
            elif rel == "alternate":
                abs_link = href

        arxiv_id = ""
        id_el = entry.find("atom:id", ARXIV_NS)
        if id_el is not None:
            m = re.search(r"abs/(.+)", id_el.text)
            if m:
                arxiv_id = m.group(1)

        papers.append({
            "title": title,
            "authors": authors,
            "published": published,
            "summary": summary,
            "abs_link": abs_link,
            "pdf_link": pdf_link,
            "arxiv_id": arxiv_id,
        })

    return papers


def download_source(arxiv_id: str, output_dir: str) -> str | None:
    """Download arXiv source tarball and extract .tex files.

    Returns the path to the extracted content or None on failure.
    """
    # Strip version suffix for source download
    base_id = re.sub(r"v\d+$", "", arxiv_id)
    source_url = f"https://arxiv.org/src/{base_id}"

    safe_name = re.sub(r"[^a-zA-Z0-9._-]", "_", arxiv_id)
    os.makedirs(output_dir, exist_ok=True)

    try:
        req = urllib.request.Request(source_url, headers={"User-Agent": "SkillScript/1.0"})
        with urllib.request.urlopen(req, timeout=60) as resp:
            tar_data = resp.read()
    except Exception as e:
        print(f"Download failed for {arxiv_id}: {e}", file=sys.stderr)
        return None

    # Save tarball to temp file, then extract
    with tempfile.NamedTemporaryFile(suffix=".tar.gz", delete=False) as tmp:
        tmp.write(tar_d

What is this skill?

Searches the arXiv export API by title (field ti) or direct arXiv ID with relevance sorting

Downloads official source tarballs and extracts .tex into a configurable output directory

Self-contained Python 3 script using only stdlib (urllib + xml.etree.ElementTree)

CLI supports disambiguation via --max-results when titles collide

Three entry paths: --title, --title with --max-results, and --arxiv-id

Self-contained stdlib-only implementation (urllib + xml.etree, no feedparser)

Three documented CLI entry patterns: --title, --title with --max-results, and --arxiv-id

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1.1k installs on skills.sh; 113 GitHub stars; 0/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Download the .tex for seminal transformer papers before choosing a model stack for your SaaS feature.

Example use

IdeaFind the right tools

Run a title search with max-results to see which arXiv entries match a vague technique keyword you heard on social media.

Example use

Pull primary-source LaTeX for papers you plan to cite in a one-page validation doc for investors or co-builders.

Example use