
Ncbi Sequence Fetch
Wire an agent or script to pull protein and nucleotide sequences from NCBI via E-utilities without hand-rolling rate limits and endpoints.
Overview
ncbi-sequence-fetch is an agent skill for the Build phase that retrieves protein and nucleotide sequences from NCBI E-utilities with rate-limited subcommands for search, fetch, link, and translation workflows.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill ncbi-sequence-fetchWhat is this skill?
- Wraps core NCBI E-utilities for protein and nucleotide retrieval with focused subcommands
- Supports efetch, esearch, elink, CDS translation, patent sequence search, and gene-to-protein lookup
- Built-in rate limiting at 3 requests/second (10/s when NCBI_API_KEY is set)
- Python 3.10+ script with shared scienceskillscommon HTTP client patterns
- Outputs structured sequence data for downstream ML, bioinformatics, or research agents
- Rate-limited to 3 requests per second without NCBI_API_KEY
- Up to 10 requests per second with NCBI_API_KEY configured
Adoption & trust: 563 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need reliable NCBI sequence data in an agent or pipeline but do not want to maintain Entrez URLs, XML parsing, and rate-limit discipline by hand.
Who is it for?
Indie builders shipping bioinformatics agents, sequence validators, or research automations that must call NCBI on every run.
Skip if: Teams that only need one-off manual downloads in a browser or already run a mature in-house Entrez service with org-wide governance.
When should I use this skill?
You need protein or nucleotide sequence retrieval from NCBI via efetch, esearch, elink, CDS translation, patent search, or gene-to-protein lookup.
What do I get? / Deliverables
You get repeatable NCBI retrieval commands with enforced throttling and focused subcommands ready to chain into analysis, validation, or RAG steps.
- Fetched sequence records from NCBI search and efetch operations
- Linked and translated sequence outputs for downstream analysis
Recommended Skills
Journey fit
Canonical shelf is Build because the skill is an executable integration layer (efetch, esearch, elink, CDS translation) you embed in pipelines and agent tooling. Integrations fits API-wrapper skills that connect your product to external scientific data services.
How it compares
Use this skill package for scripted Entrez retrieval instead of bolting raw urllib calls into every agent prompt.
Common Questions / FAQ
Who is ncbi-sequence-fetch for?
Solo and indie developers building Python-based research agents, pipelines, or skills that must fetch live NCBI protein and nucleotide records programmatically.
When should I use ncbi-sequence-fetch?
Use it during Build when integrating external scientific APIs—especially before encoding search-and-fetch loops for GenBank data, patent sequences, or gene-to-protein resolution in your agent.
Is ncbi-sequence-fetch safe to install?
Review the Security Audits panel on this Prism page and inspect the script’s network calls and dependencies before running it with production API keys or sensitive workloads.
SKILL.md
READMESKILL.md - Ncbi Sequence Fetch
# Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """NCBI E-utilities wrapper for protein and nucleotide sequence retrieval. Provides subcommands for the core NCBI E-utilities operations needed for protein sequence retrieval: efetch, esearch, elink, CDS translation, patent sequence search, and gene-to-protein lookup. Rate-limited to 3 requests/second (10/s with NCBI_API_KEY). """ # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # "python-dotenv", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// from __future__ import annotations import argparse import json import os import re import sys from typing import Any import urllib.parse import urllib.request import xml.etree.ElementTree as ET import dotenv from science_skills.scienceskillscommon import http_client _CODON_TABLE = { 'ATA': 'I', 'ATC': 'I', 'ATT': 'I', 'ATG': 'M', 'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T', 'AAC': 'N', 'AAT': 'N', 'AAA': 'K', 'AAG': 'K', 'AGC': 'S', 'AGT': 'S', 'AGA': 'R', 'AGG': 'R', 'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P', 'CAC': 'H', 'CAT': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R', 'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A', 'GAC': 'D', 'GAT': 'D', 'GAA': 'E', 'GAG': 'E', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G', 'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S', 'TTC': 'F', 'TTT': 'F', 'TTA': 'L', 'TTG': 'L', 'TAC': 'Y', 'TAT': 'Y', 'TAA': '*', 'TAG': '*', 'TGC': 'C', 'TGT': 'C', 'TGA': '*', 'TGG': 'W', } EUTILS_BASE = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils' _CLIENT = None def get_api_client(): """Returns the lazily initialized HttpClient.""" global _CLIENT if _CLIENT is None: api_key = os.environ.get('NCBI_API_KEY', '') qps = 10.0 if api_key else 3.0 _CLIENT = http_client.HttpClient(EUTILS_BASE, qps=qps) return _CLIENT def _eutils_get(endpoint: str, params: dict[str, str | int]) -> str | None: """Sends a GET request to an NCBI E-utilities endpoint. Handles rate limiting, retries with backoff, and API key injection. Args: endpoint: E-utilities endpoint name (e.g. 'efetch.fcgi'). params: Query parameters as a dict. Returns: Response text or None on failure. """ api_key = os.environ.get('NCBI_API_KEY', '') if api_key: params['api_key'] = api_key query_string = urllib.parse.urlencode(params) full_url = f'{EUTILS_BASE}/{endpoint}' if query_string: full_url += f'?{query_string}' try: return get_api_client().fetch_text(full_url) except http_client.HttpError as e: print(f'{endpoint} error after all retires: {e}', file=sys.stderr) return None def efetch( db: str, db_id: str | int, retmode: str = 'text', rettype: str = 'fasta', ) -> str | None: """Fetches data from NCBI efetch endpoint. Args: db: Database name (protein, nuccore, gene, pubmed, etc.) db_id: One or more comma-separated IDs. retmode: Return mode (text, xml, json). rettype: Return type (fasta, gb, gp, fasta_cds_aa, etc.) Returns: Response text or None on