
Literature Search Arxiv
Query arXiv with field prefixes, boolean logic, and date windows when a solo builder needs papers before committing to a technical direction.
Overview
Literature Search arXiv is an agent skill for the Idea phase that documents arXiv advanced query syntax for scripts/search_arxiv.py.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill literature-search-arxivWhat is this skill?
- Documents eight arXiv field prefixes (ti, au, abs, co, jr, cat, rn, all) for precise queries
- Supports AND, OR, ANDNOT boolean composition with parentheses and quoted phrases
- Date filtering via submittedDate ranges in YYYYMMDDHHMM GMT format
- Pairs with scripts/search_arxiv.py for URL-encoded advanced searches
- Category-scoped queries such as cat:cs.AI for subject filtering
- Eight documented field prefixes: ti, au, abs, co, jr, cat, rn, all
- Boolean operators AND, OR, and ANDNOT with grouping and phrase quotes
Adoption & trust: 741 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need peer-reviewed or preprint literature on arXiv but generic keyword searches miss authors, categories, and date ranges you care about.
Who is it for?
Solo builders doing technical due diligence, ML feature research, or literature reviews before writing a spec or landing copy.
Skip if: Teams that only need non-academic web SEO research or already have a curated paper list with no new arXiv pulls.
When should I use this skill?
You are building or running arXiv searches via scripts/search_arxiv.py and need correct --query strings.
What do I get? / Deliverables
After applying the skill, your agent emits well-formed arXiv queries with field prefixes, booleans, and optional date filters ready for the search script.
- Valid arXiv query strings with optional date and category filters
Recommended Skills
Journey fit
Canonical shelf is Idea → research because literature search precedes product decisions and validates what is already known in the field. Research subphase is where competitor and academic discovery happens; arXiv syntax directly supports structured paper retrieval.
How it compares
Reference skill for query grammar, not a hosted literature database or MCP paper server.
Common Questions / FAQ
Who is literature-search-arxiv for?
Indie and solo builders using Claude Code, Cursor, or Codex who research arXiv preprints during Idea-phase discovery.
When should I use literature-search-arxiv?
Use it in Idea → research when scoping an AI product, validating a novel approach, or gathering citations before validate → scope; also when refining cat: filters during build → docs.
Is literature-search-arxiv safe to install?
Review the Security Audits panel on this Prism page and inspect google-deepmind/science-skills in your repo before running bundled scripts that call external APIs.
SKILL.md
READMESKILL.md - Literature Search Arxiv
# arXiv Query Syntax Reference When using `scripts/search_arxiv.py --query "..."`, you can use the following advanced search features. The script automatically handles URL encoding. ## Field Prefixes Prefix your search terms to target specific fields: - `ti:` Title - `au:` Author - `abs:` Abstract - `co:` Comment - `jr:` Journal Reference - `cat:` Subject Category (e.g., `cat:cs.AI`) - `rn:` Report Number - `all:` All fields ## Boolean Operators Combine terms using `AND`, `OR`, and `ANDNOT`. *Example*: `au:del_maestro ANDNOT ti:checkerboard` ## Grouping and Phrases - **Parentheses `()`**: Group boolean expressions. *Example*: `au:del_maestro ANDNOT (ti:checkerboard OR ti:Pyrochlore)` - **Double Quotes `""`**: Search for exact phrases. *Example*: `au:del_maestro AND ti:"quantum criticality"` ## Date Filtering Filter by the date submitted to arXiv. Format: `[YYYYMMDDHHMM TO YYYYMMDDHHMM]` (GMT). *Example*: `au:del_maestro AND submittedDate:[202301010600 TO 202401010600]` # Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Downloads the source (tar.gz) of a paper from arXiv given its ID. This script allows downloading the LaTeX source files of arXiv papers and saving them to a specified output file path. """ # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// import argparse import os import sys import urllib.error from science_skills.scienceskillscommon import http_client _CLIENT = http_client.HttpClient("https://export.arxiv.org/", qps=1.0 / 3.0) def parse_args() -> argparse.Namespace: """Parses command-line arguments for the download script. Returns: argparse.Namespace: An object containing the parsed arguments. """ parser = argparse.ArgumentParser( description="Download paper source (tar.gz) from arXiv" ) parser.add_argument( "--id", type=str, required=True, help="arXiv ID (e.g., 2010.11645)" ) parser.add_argument( "--output", type=str, required=True, help="Output file path for the tar.gz file", ) return parser.parse_args() def download_source(args: argparse.Namespace): """Downloads the source of a paper from arXiv based on the provided arguments. This function fetches the source (tar.gz) from arXiv using the specified ID, saving it to the given output path. It includes error handling for common issues like 404 Not Found and network errors, and enforces a rate limit after each download attempt. Args: args: An argparse.Namespace object containing: - id (str): The arXiv ID of the paper. - output (str): The file path where the tar.gz will be saved. """ # Ensure ID is clean paper_id = args.id.strip() url = f"https://export.arxiv.org/e-print/{paper_id}" print(f"Attempting to download source from {url}...") try: content = _CLIENT.fetch_bytes(url) out_dir = os.path.dirname(args.output) if out_dir: os.makedirs(out_dir, exist_ok=True) with open(args.output, "wb") as f: f.write(content) print(f"Success! Saved to {args.output}") except urllib.error.HTTPError as e: if e.code == 404: print( f"Error 404: Source not found (ID: {paper_id}). Not all papers have" " source available.", file=sys.stderr, ) else: raise if __name__ == "__main__": main_args = parse_args() download_source(mai