Deep Research

Idea phase is where founders explore whether a technique is published, feasible, or already saturated—arXiv is the primary shelf for that evidence gathering. Research subphase matches structured literature search, not prototyping code or landing copy.

Also useful

Also useful

Where it fits

Example use

Run cat:cs.AI and all:"language model" queries to see if your agent idea is already crowded in recent preprints.

Example use

IdeaFind the right tools

Scan cs.MA listings to discover multi-agent system trends before writing a positioning doc.

Example use

Embed arXiv fetch parameters in a small research sidecar service with 100-result pages.

Example use

Re-query lastUpdatedDate sorts monthly to watch new papers affecting your model roadmap.

How it compares

Structured arXiv integration reference—not a general Perplexity-style deep web crawl skill.

Common Questions / FAQ

Who is deep-research for?

Indie builders and agent authors who need programmatic arXiv literature search with field prefixes, categories, and pagination discipline.

When should I use deep-research?

In Idea research when validating novelty; during Build when picking model families from recent cs.CL or cs.LG papers; in Operate iterate when monitoring new preprints in your niche.

Is deep-research safe to install?

The skill describes public HTTP queries only; review the Security Audits panel on this Prism page and avoid piping untrusted XML into unsafe parsers in your own scripts.

SKILL.md

READMESKILL.md - Deep Research

# API Reference Guide

## arXiv API

### Base URL
```
http://export.arxiv.org/api/query
```

### Query Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| `search_query` | Search terms with field prefixes | `all:transformer+AND+cat:cs.AI` |
| `start` | Offset for pagination | `0` |
| `max_results` | Results per page (max 100) | `50` |
| `sortBy` | Sort field | `relevance`, `lastUpdatedDate`, `submittedDate` |
| `sortOrder` | Sort direction | `descending`, `ascending` |

### Query Syntax
- **Field prefixes**: `ti:` (title), `au:` (author), `abs:` (abstract), `all:` (all fields), `cat:` (category)
- **Boolean operators**: `AND`, `OR`, `ANDNOT`
- **Grouping**: parentheses `()`
- **Examples**:
  - `all:transformer AND cat:cs.CL` — transformers in CL
  - `au:vaswani AND ti:attention` — Vaswani papers about attention
  - `(cat:cs.AI OR cat:cs.CL) AND all:"language model"` — LM papers in AI or CL

### Common Categories
| Category | Field |
|----------|-------|
| `cs.AI` | Artificial Intelligence |
| `cs.CL` | Computation and Language (NLP) |
| `cs.LG` | Machine Learning |
| `cs.CV` | Computer Vision |
| `cs.MA` | Multiagent Systems |
| `cs.SE` | Software Engineering |
| `q-bio.BM` | Biomolecules |
| `q-bio.GN` | Genomics |
| `q-bio.QM` | Quantitative Methods |
| `stat.ML` | Machine Learning (Statistics) |

### Rate Limits
- **1 request per 3 seconds** (be conservative)
- Results are Atom XML format
- Max 100 results per request, paginate for more

### Script Usage
```bash
python /Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py \
  --query "long context reasoning LLM" \
  --max-results 50 \
  --categories cs.AI cs.CL \
  --sort-by relevance \
  --start-date 2023-01-01 \
  -o results.jsonl
```

### WebFetch Usage
```
WebFetch http://export.arxiv.org/api/query?search_query=all:transformer+AND+cat:cs.AI&max_results=10&sortBy=relevance
```
Parse the Atom XML response to extract paper entries.

---

## Semantic Scholar Graph API

### Base URL
```
https://api.semanticscholar.org/graph/v1
```

### Authentication
- API key from `/Users/lingzhi/Code/keys.md` (field `S2_API_Key`)
- Header: `x-api-key: <key>`
- Without key: 100 requests/5 min. With key: 1 request/second sustained.

### Endpoints

#### Paper Search
```
GET /paper/search?query=...&fields=...&offset=0&limit=100
```

| Parameter | Description |
|-----------|-------------|
| `query` | Search string |
| `fields` | Comma-separated field list |
| `offset` | Pagination offset |
| `limit` | Results per page (max 100) |
| `year` | Year range filter (e.g., `2020-2026`, `2024-`, `-2020`) |
| `fieldsOfStudy` | Filter by field (e.g., `Computer Science`) |
| `venue` | Filter by venue |

#### Paper Details
```
GET /paper/{paper_id}?fields=...
```
`paper_id` can be: Semantic Scholar paperId, `arxiv:2401.12345`, `DOI:10.xxx`, `PMID:xxx`

#### Citations
```
GET /paper/{paper_id}/citations?fields=...&limit=1000
```
Returns papers that cite the given paper.

#### References
```
GET /paper/{paper_id}/references?fields=...&limit=1000
```
Returns papers referenced by the given paper.

#### Batch Paper Details
```
POST /paper/batch?fields=...
Body: {"ids": ["paper_id_1", "arxiv:2401.12345", ...]}
```
Get details for up to 500 papers at once.

### Useful Fields
```
title,authors,abstract,year,venue,citationCount,referenceCount,
externalIds,url,publicationDate,tldr,isOpenAccess,openAccessPdf
```

### Rate Limits
- **Public**: 100 requests per 5 minutes (burst)
- **Authenticated**: 1 request/second sustained, 10/second burst
- On 429: exponential backoff (2s, 4s, 8s)

### Script Usage
```bash
python /Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py \
  --query "long horizon reasoning LLM agent" \
  --max-results 100 \
  --min-citations 10 \
  --year-range 2022-2026 \
  --api-key <key> \
  -o results.jsonl
```

### WebFetch Usage
```
WebFetch https://api.semanticscholar.org/graph/v1/paper/search?query=long+horizon+reasoning&

What is this skill?

Documents arXiv export API base URL and Atom XML response format

Field prefixes: ti, au, abs, all, cat with AND/OR/ANDNOT and grouping

Maps common CS and q-bio categories (cs.AI, cs.CL, cs.LG, cs.CV, etc.)

Pagination via start and max_results with a 100-results-per-request cap

Rate limit guidance: about one request every three seconds

Max 100 results per arXiv API request

Guidance of 1 request per 3 seconds rate limit

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 779 installs on skills.sh; 114 GitHub stars; 1/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

Also useful

Where it fits

Example use

Run cat:cs.AI and all:"language model" queries to see if your agent idea is already crowded in recent preprints.

Example use

IdeaFind the right tools

Scan cs.MA listings to discover multi-agent system trends before writing a positioning doc.

Example use

Embed arXiv fetch parameters in a small research sidecar service with 100-result pages.

Example use