Literature Search Arxiv

Name: Literature Search Arxiv
Author: google-deepmind

google-deepmind/science-skills

Query arXiv with field prefixes, boolean logic, and date windows when a solo builder needs papers before committing to a technical direction.

Overview

Literature Search arXiv is an agent skill for the Idea phase that documents arXiv advanced query syntax for scripts/search_arxiv.py.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill literature-search-arxiv

What is this skill?

Documents eight arXiv field prefixes (ti, au, abs, co, jr, cat, rn, all) for precise queries
Supports AND, OR, ANDNOT boolean composition with parentheses and quoted phrases
Date filtering via submittedDate ranges in YYYYMMDDHHMM GMT format
Pairs with scripts/search_arxiv.py for URL-encoded advanced searches
Category-scoped queries such as cat:cs.AI for subject filtering
Eight documented field prefixes: ti, au, abs, co, jr, cat, rn, all
Boolean operators AND, OR, and ANDNOT with grouping and phrase quotes

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 741 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need peer-reviewed or preprint literature on arXiv but generic keyword searches miss authors, categories, and date ranges you care about.

Who is it for?

Solo builders doing technical due diligence, ML feature research, or literature reviews before writing a spec or landing copy.

Skip if: Teams that only need non-academic web SEO research or already have a curated paper list with no new arXiv pulls.

When should I use this skill?

You are building or running arXiv searches via scripts/search_arxiv.py and need correct --query strings.

What do I get? / Deliverables

After applying the skill, your agent emits well-formed arXiv queries with field prefixes, booleans, and optional date filters ready for the search script.

Valid arXiv query strings with optional date and category filters

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

IdeaOpportunity & market research

Canonical shelf is Idea → research because literature search precedes product decisions and validates what is already known in the field. Research subphase is where competitor and academic discovery happens; arXiv syntax directly supports structured paper retrieval.

Also useful

ValidateScope & plan

Also useful

BuildDocs & content

How it compares

Reference skill for query grammar, not a hosted literature database or MCP paper server.

Common Questions / FAQ

Who is literature-search-arxiv for?

Indie and solo builders using Claude Code, Cursor, or Codex who research arXiv preprints during Idea-phase discovery.

When should I use literature-search-arxiv?

Use it in Idea → research when scoping an AI product, validating a novel approach, or gathering citations before validate → scope; also when refining cat: filters during build → docs.

Is literature-search-arxiv safe to install?

Review the Security Audits panel on this Prism page and inspect google-deepmind/science-skills in your repo before running bundled scripts that call external APIs.

SKILL.md

READMESKILL.md - Literature Search Arxiv

# arXiv Query Syntax Reference

When using `scripts/search_arxiv.py --query "..."`, you can use the following
advanced search features. The script automatically handles URL encoding.

## Field Prefixes

Prefix your search terms to target specific fields:

- `ti:` Title
- `au:` Author
- `abs:` Abstract
- `co:` Comment
- `jr:` Journal Reference
- `cat:` Subject Category (e.g., `cat:cs.AI`)
- `rn:` Report Number
- `all:` All fields

## Boolean Operators

Combine terms using `AND`, `OR`, and `ANDNOT`.
*Example*: `au:del_maestro ANDNOT ti:checkerboard`

## Grouping and Phrases

- **Parentheses `()`**: Group boolean expressions.
  *Example*: `au:del_maestro ANDNOT (ti:checkerboard OR ti:Pyrochlore)`
- **Double Quotes `""`**: Search for exact phrases.
  *Example*: `au:del_maestro AND ti:"quantum criticality"`

## Date Filtering
Filter by the date submitted to arXiv.
Format: `[YYYYMMDDHHMM TO YYYYMMDDHHMM]` (GMT).
*Example*: `au:del_maestro AND submittedDate:[202301010600 TO 202401010600]`


# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Downloads the source (tar.gz) of a paper from arXiv given its ID.

This script allows downloading the LaTeX source files of arXiv papers
and saving them to a specified output file path.
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import os
import sys
import urllib.error

from science_skills.scienceskillscommon import http_client

_CLIENT = http_client.HttpClient("https://export.arxiv.org/", qps=1.0 / 3.0)


def parse_args() -> argparse.Namespace:
  """Parses command-line arguments for the download script.

  Returns:
    argparse.Namespace: An object containing the parsed arguments.
  """
  parser = argparse.ArgumentParser(
      description="Download paper source (tar.gz) from arXiv"
  )
  parser.add_argument(
      "--id", type=str, required=True, help="arXiv ID (e.g., 2010.11645)"
  )
  parser.add_argument(
      "--output",
      type=str,
      required=True,
      help="Output file path for the tar.gz file",
  )
  return parser.parse_args()


def download_source(args: argparse.Namespace):
  """Downloads the source of a paper from arXiv based on the provided arguments.

  This function fetches the source (tar.gz) from arXiv using the
  specified ID, saving it to the given output path. It includes
  error handling for common issues like 404 Not Found and network errors,
  and enforces a rate limit after each download attempt.

  Args:
    args: An argparse.Namespace object containing: - id (str): The arXiv ID of
      the paper. - output (str): The file path where the tar.gz will be saved.
  """
  # Ensure ID is clean
  paper_id = args.id.strip()

  url = f"https://export.arxiv.org/e-print/{paper_id}"
  print(f"Attempting to download source from {url}...")

  try:
    content = _CLIENT.fetch_bytes(url)
    out_dir = os.path.dirname(args.output)
    if out_dir:
      os.makedirs(out_dir, exist_ok=True)
    with open(args.output, "wb") as f:
      f.write(content)
    print(f"Success! Saved to {args.output}")

  except urllib.error.HTTPError as e:
    if e.code == 404:
      print(
          f"Error 404: Source not found (ID: {paper_id}). Not all papers have"
          " source available.",
          file=sys.stderr,
      )
    else:
      raise


if __name__ == "__main__":
  main_args = parse_args()
  download_source(mai

What is this skill?

Documents eight arXiv field prefixes (ti, au, abs, co, jr, cat, rn, all) for precise queries

Supports AND, OR, ANDNOT boolean composition with parentheses and quoted phrases

Date filtering via submittedDate ranges in YYYYMMDDHHMM GMT format

Pairs with scripts/search_arxiv.py for URL-encoded advanced searches

Category-scoped queries such as cat:cs.AI for subject filtering

Eight documented field prefixes: ti, au, abs, co, jr, cat, rn, all

Boolean operators AND, OR, and ANDNOT with grouping and phrase quotes

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 741 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

IdeaOpportunity & market research

Also useful

ValidateScope & plan

Also useful

BuildDocs & content

SKILL.md

READMESKILL.md - Literature Search Arxiv

# arXiv Query Syntax Reference

When using `scripts/search_arxiv.py --query "..."`, you can use the following
advanced search features. The script automatically handles URL encoding.

## Field Prefixes

Prefix your search terms to target specific fields:

- `ti:` Title
- `au:` Author
- `abs:` Abstract
- `co:` Comment
- `jr:` Journal Reference
- `cat:` Subject Category (e.g., `cat:cs.AI`)
- `rn:` Report Number
- `all:` All fields

## Boolean Operators

Combine terms using `AND`, `OR`, and `ANDNOT`.
*Example*: `au:del_maestro ANDNOT ti:checkerboard`

## Grouping and Phrases

- **Parentheses `()`**: Group boolean expressions.
  *Example*: `au:del_maestro ANDNOT (ti:checkerboard OR ti:Pyrochlore)`
- **Double Quotes `""`**: Search for exact phrases.
  *Example*: `au:del_maestro AND ti:"quantum criticality"`

## Date Filtering
Filter by the date submitted to arXiv.
Format: `[YYYYMMDDHHMM TO YYYYMMDDHHMM]` (GMT).
*Example*: `au:del_maestro AND submittedDate:[202301010600 TO 202401010600]`


# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Downloads the source (tar.gz) of a paper from arXiv given its ID.

This script allows downloading the LaTeX source files of arXiv papers
and saving them to a specified output file path.
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import os
import sys
import urllib.error

from science_skills.scienceskillscommon import http_client

_CLIENT = http_client.HttpClient("https://export.arxiv.org/", qps=1.0 / 3.0)


def parse_args() -> argparse.Namespace:
  """Parses command-line arguments for the download script.

  Returns:
    argparse.Namespace: An object containing the parsed arguments.
  """
  parser = argparse.ArgumentParser(
      description="Download paper source (tar.gz) from arXiv"
  )
  parser.add_argument(
      "--id", type=str, required=True, help="arXiv ID (e.g., 2010.11645)"
  )
  parser.add_argument(
      "--output",
      type=str,
      required=True,
      help="Output file path for the tar.gz file",
  )
  return parser.parse_args()


def download_source(args: argparse.Namespace):
  """Downloads the source of a paper from arXiv based on the provided arguments.

  This function fetches the source (tar.gz) from arXiv using the
  specified ID, saving it to the given output path. It includes
  error handling for common issues like 404 Not Found and network errors,
  and enforces a rate limit after each download attempt.

  Args:
    args: An argparse.Namespace object containing: - id (str): The arXiv ID of
      the paper. - output (str): The file path where the tar.gz will be saved.
  """
  # Ensure ID is clean
  paper_id = args.id.strip()

  url = f"https://export.arxiv.org/e-print/{paper_id}"
  print(f"Attempting to download source from {url}...")

  try:
    content = _CLIENT.fetch_bytes(url)
    out_dir = os.path.dirname(args.output)
    if out_dir:
      os.makedirs(out_dir, exist_ok=True)
    with open(args.output, "wb") as f:
      f.write(content)
    print(f"Success! Saved to {args.output}")

  except urllib.error.HTTPError as e:
    if e.code == 404:
      print(
          f"Error 404: Source not found (ID: {paper_id}). Not all papers have"
          " source available.",
          file=sys.stderr,
      )
    else:
      raise


if __name__ == "__main__":
  main_args = parse_args()
  download_source(mai

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is literature-search-arxiv for?

When should I use literature-search-arxiv?

Is literature-search-arxiv safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is literature-search-arxiv for?

When should I use literature-search-arxiv?

Is literature-search-arxiv safe to install?

SKILL.md