Ncbi Sequence Fetch

Name: Ncbi Sequence Fetch
Author: google-deepmind

google-deepmind/science-skills

Wire an agent or script to pull protein and nucleotide sequences from NCBI via E-utilities without hand-rolling rate limits and endpoints.

Overview

ncbi-sequence-fetch is an agent skill for the Build phase that retrieves protein and nucleotide sequences from NCBI E-utilities with rate-limited subcommands for search, fetch, link, and translation workflows.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill ncbi-sequence-fetch

What is this skill?

Wraps core NCBI E-utilities for protein and nucleotide retrieval with focused subcommands
Supports efetch, esearch, elink, CDS translation, patent sequence search, and gene-to-protein lookup
Built-in rate limiting at 3 requests/second (10/s when NCBI_API_KEY is set)
Python 3.10+ script with shared scienceskillscommon HTTP client patterns
Outputs structured sequence data for downstream ML, bioinformatics, or research agents
Rate-limited to 3 requests per second without NCBI_API_KEY
Up to 10 requests per second with NCBI_API_KEY configured

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 563 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need reliable NCBI sequence data in an agent or pipeline but do not want to maintain Entrez URLs, XML parsing, and rate-limit discipline by hand.

Who is it for?

Indie builders shipping bioinformatics agents, sequence validators, or research automations that must call NCBI on every run.

Skip if: Teams that only need one-off manual downloads in a browser or already run a mature in-house Entrez service with org-wide governance.

When should I use this skill?

You need protein or nucleotide sequence retrieval from NCBI via efetch, esearch, elink, CDS translation, patent search, or gene-to-protein lookup.

What do I get? / Deliverables

You get repeatable NCBI retrieval commands with enforced throttling and focused subcommands ready to chain into analysis, validation, or RAG steps.

Fetched sequence records from NCBI search and efetch operations
Linked and translated sequence outputs for downstream analysis

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Canonical shelf is Build because the skill is an executable integration layer (efetch, esearch, elink, CDS translation) you embed in pipelines and agent tooling. Integrations fits API-wrapper skills that connect your product to external scientific data services.

Also useful

IdeaOpportunity & market research

How it compares

Use this skill package for scripted Entrez retrieval instead of bolting raw urllib calls into every agent prompt.

Common Questions / FAQ

Who is ncbi-sequence-fetch for?

Solo and indie developers building Python-based research agents, pipelines, or skills that must fetch live NCBI protein and nucleotide records programmatically.

When should I use ncbi-sequence-fetch?

Use it during Build when integrating external scientific APIs—especially before encoding search-and-fetch loops for GenBank data, patent sequences, or gene-to-protein resolution in your agent.

Is ncbi-sequence-fetch safe to install?

Review the Security Audits panel on this Prism page and inspect the script’s network calls and dependencies before running it with production API keys or sensitive workloads.

SKILL.md

READMESKILL.md - Ncbi Sequence Fetch

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""NCBI E-utilities wrapper for protein and nucleotide sequence retrieval.

Provides subcommands for the core NCBI E-utilities operations needed for
protein sequence retrieval: efetch, esearch, elink, CDS translation,
patent sequence search, and gene-to-protein lookup.

Rate-limited to 3 requests/second (10/s with NCBI_API_KEY).
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
#   "python-dotenv",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

from __future__ import annotations

import argparse
import json
import os
import re
import sys
from typing import Any
import urllib.parse
import urllib.request
import xml.etree.ElementTree as ET

import dotenv
from science_skills.scienceskillscommon import http_client

_CODON_TABLE = {
    'ATA': 'I',
    'ATC': 'I',
    'ATT': 'I',
    'ATG': 'M',
    'ACA': 'T',
    'ACC': 'T',
    'ACG': 'T',
    'ACT': 'T',
    'AAC': 'N',
    'AAT': 'N',
    'AAA': 'K',
    'AAG': 'K',
    'AGC': 'S',
    'AGT': 'S',
    'AGA': 'R',
    'AGG': 'R',
    'CTA': 'L',
    'CTC': 'L',
    'CTG': 'L',
    'CTT': 'L',
    'CCA': 'P',
    'CCC': 'P',
    'CCG': 'P',
    'CCT': 'P',
    'CAC': 'H',
    'CAT': 'H',
    'CAA': 'Q',
    'CAG': 'Q',
    'CGA': 'R',
    'CGC': 'R',
    'CGG': 'R',
    'CGT': 'R',
    'GTA': 'V',
    'GTC': 'V',
    'GTG': 'V',
    'GTT': 'V',
    'GCA': 'A',
    'GCC': 'A',
    'GCG': 'A',
    'GCT': 'A',
    'GAC': 'D',
    'GAT': 'D',
    'GAA': 'E',
    'GAG': 'E',
    'GGA': 'G',
    'GGC': 'G',
    'GGG': 'G',
    'GGT': 'G',
    'TCA': 'S',
    'TCC': 'S',
    'TCG': 'S',
    'TCT': 'S',
    'TTC': 'F',
    'TTT': 'F',
    'TTA': 'L',
    'TTG': 'L',
    'TAC': 'Y',
    'TAT': 'Y',
    'TAA': '*',
    'TAG': '*',
    'TGC': 'C',
    'TGT': 'C',
    'TGA': '*',
    'TGG': 'W',
}


EUTILS_BASE = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'

_CLIENT = None


def get_api_client():
  """Returns the lazily initialized HttpClient."""
  global _CLIENT
  if _CLIENT is None:
    api_key = os.environ.get('NCBI_API_KEY', '')
    qps = 10.0 if api_key else 3.0
    _CLIENT = http_client.HttpClient(EUTILS_BASE, qps=qps)
  return _CLIENT


def _eutils_get(endpoint: str, params: dict[str, str | int]) -> str | None:
  """Sends a GET request to an NCBI E-utilities endpoint.

  Handles rate limiting, retries with backoff, and API key
  injection.

  Args:
    endpoint: E-utilities endpoint name (e.g. 'efetch.fcgi').
    params: Query parameters as a dict.

  Returns:
    Response text or None on failure.
  """
  api_key = os.environ.get('NCBI_API_KEY', '')
  if api_key:
    params['api_key'] = api_key

  query_string = urllib.parse.urlencode(params)
  full_url = f'{EUTILS_BASE}/{endpoint}'
  if query_string:
    full_url += f'?{query_string}'

  try:
    return get_api_client().fetch_text(full_url)
  except http_client.HttpError as e:
    print(f'{endpoint} error after all retires: {e}', file=sys.stderr)
    return None


def efetch(
    db: str,
    db_id: str | int,
    retmode: str = 'text',
    rettype: str = 'fasta',
) -> str | None:
  """Fetches data from NCBI efetch endpoint.

  Args:
    db: Database name (protein, nuccore, gene, pubmed, etc.)
    db_id: One or more comma-separated IDs.
    retmode: Return mode (text, xml, json).
    rettype: Return type (fasta, gb, gp, fasta_cds_aa, etc.)

  Returns:
    Response text or None on

What is this skill?

Wraps core NCBI E-utilities for protein and nucleotide retrieval with focused subcommands

Supports efetch, esearch, elink, CDS translation, patent sequence search, and gene-to-protein lookup

Built-in rate limiting at 3 requests/second (10/s when NCBI_API_KEY is set)

Python 3.10+ script with shared scienceskillscommon HTTP client patterns

Outputs structured sequence data for downstream ML, bioinformatics, or research agents

Rate-limited to 3 requests per second without NCBI_API_KEY

Up to 10 requests per second with NCBI_API_KEY configured

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 563 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Ncbi Sequence Fetch

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""NCBI E-utilities wrapper for protein and nucleotide sequence retrieval.

Provides subcommands for the core NCBI E-utilities operations needed for
protein sequence retrieval: efetch, esearch, elink, CDS translation,
patent sequence search, and gene-to-protein lookup.

Rate-limited to 3 requests/second (10/s with NCBI_API_KEY).
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
#   "python-dotenv",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

from __future__ import annotations

import argparse
import json
import os
import re
import sys
from typing import Any
import urllib.parse
import urllib.request
import xml.etree.ElementTree as ET

import dotenv
from science_skills.scienceskillscommon import http_client

_CODON_TABLE = {
    'ATA': 'I',
    'ATC': 'I',
    'ATT': 'I',
    'ATG': 'M',
    'ACA': 'T',
    'ACC': 'T',
    'ACG': 'T',
    'ACT': 'T',
    'AAC': 'N',
    'AAT': 'N',
    'AAA': 'K',
    'AAG': 'K',
    'AGC': 'S',
    'AGT': 'S',
    'AGA': 'R',
    'AGG': 'R',
    'CTA': 'L',
    'CTC': 'L',
    'CTG': 'L',
    'CTT': 'L',
    'CCA': 'P',
    'CCC': 'P',
    'CCG': 'P',
    'CCT': 'P',
    'CAC': 'H',
    'CAT': 'H',
    'CAA': 'Q',
    'CAG': 'Q',
    'CGA': 'R',
    'CGC': 'R',
    'CGG': 'R',
    'CGT': 'R',
    'GTA': 'V',
    'GTC': 'V',
    'GTG': 'V',
    'GTT': 'V',
    'GCA': 'A',
    'GCC': 'A',
    'GCG': 'A',
    'GCT': 'A',
    'GAC': 'D',
    'GAT': 'D',
    'GAA': 'E',
    'GAG': 'E',
    'GGA': 'G',
    'GGC': 'G',
    'GGG': 'G',
    'GGT': 'G',
    'TCA': 'S',
    'TCC': 'S',
    'TCG': 'S',
    'TCT': 'S',
    'TTC': 'F',
    'TTT': 'F',
    'TTA': 'L',
    'TTG': 'L',
    'TAC': 'Y',
    'TAT': 'Y',
    'TAA': '*',
    'TAG': '*',
    'TGC': 'C',
    'TGT': 'C',
    'TGA': '*',
    'TGG': 'W',
}


EUTILS_BASE = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'

_CLIENT = None


def get_api_client():
  """Returns the lazily initialized HttpClient."""
  global _CLIENT
  if _CLIENT is None:
    api_key = os.environ.get('NCBI_API_KEY', '')
    qps = 10.0 if api_key else 3.0
    _CLIENT = http_client.HttpClient(EUTILS_BASE, qps=qps)
  return _CLIENT


def _eutils_get(endpoint: str, params: dict[str, str | int]) -> str | None:
  """Sends a GET request to an NCBI E-utilities endpoint.

  Handles rate limiting, retries with backoff, and API key
  injection.

  Args:
    endpoint: E-utilities endpoint name (e.g. 'efetch.fcgi').
    params: Query parameters as a dict.

  Returns:
    Response text or None on failure.
  """
  api_key = os.environ.get('NCBI_API_KEY', '')
  if api_key:
    params['api_key'] = api_key

  query_string = urllib.parse.urlencode(params)
  full_url = f'{EUTILS_BASE}/{endpoint}'
  if query_string:
    full_url += f'?{query_string}'

  try:
    return get_api_client().fetch_text(full_url)
  except http_client.HttpError as e:
    print(f'{endpoint} error after all retires: {e}', file=sys.stderr)
    return None


def efetch(
    db: str,
    db_id: str | int,
    retmode: str = 'text',
    rettype: str = 'fasta',
) -> str | None:
  """Fetches data from NCBI efetch endpoint.

  Args:
    db: Database name (protein, nuccore, gene, pubmed, etc.)
    db_id: One or more comma-separated IDs.
    retmode: Return mode (text, xml, json).
    rettype: Return type (fasta, gb, gp, fasta_cds_aa, etc.)

  Returns:
    Response text or None on

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ncbi-sequence-fetch for?

When should I use ncbi-sequence-fetch?

Is ncbi-sequence-fetch safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ncbi-sequence-fetch for?

When should I use ncbi-sequence-fetch?

Is ncbi-sequence-fetch safe to install?

SKILL.md