
Uniprot Database
Look up curated protein sequences, functions, taxonomy, and cross-database IDs via UniProtKB, UniParc, and UniRef without hand-rolling fragile API calls.
Overview
UniProt Database is an agent skill for the Idea phase that retrieves protein metadata, sequences, and annotations from UniProt via scripted API access.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill uniprot-databaseWhat is this skill?
- UniProtKB, UniParc, and UniRef access through provided Python wrapper scripts
- Protein search, identifier mapping, functional annotations, and publication-linked metadata
- Explicit boundary: not for alignment, folding, or sequence similarity search
- License notification workflow with timestamped LICENSE_NOTIFICATION.txt
- Requires uv on PATH per bundled setup instructions
Adoption & trust: 551 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need authoritative protein records and IDs for a prototype, but manual API calls risk wrong endpoints, invented annotations, or license blind spots.
Who is it for?
Builders and researchers agent-driving protein discovery, ID mapping, or functional annotation pulls during early bioinformatics exploration.
Skip if: Structural biology tasks needing alignment, folding, or dedicated sequence similarity search outside UniProt lookup scope.
When should I use this skill?
User searches proteins, maps identifiers, or retrieves UniProt functional annotations and sequences; do not use for alignment, folding, or similarity search.
What do I get? / Deliverables
You get script-driven UniProt lookups with mapped identifiers and curated annotations, plus documented license notification before repeated queries.
- Script-executed UniProt query results (metadata, sequences, mappings)
- License notification file when first-run terms prompt applies
- Identifier cross-reference tables suitable for downstream pipelines
Recommended Skills
Journey fit
Protein discovery and annotation retrieval sit in Idea research when you are exploring biological targets or validating data sources before building pipelines or apps. Research subphase fits literature-adjacent database lookups and identifier mapping—not production ETL or structural folding workflows.
How it compares
Curated UniProt lookup integration—not a structural predictor or generic arbitrary REST code generator.
Common Questions / FAQ
Who is uniprot-database for?
Solo developers and researchers building bioinformatics agents or data tooling who need reliable UniProtKB, UniParc, and UniRef access with guardrails against invented biology.
When should I use uniprot-database?
Use it in Idea research when searching proteins, mapping accessions, or pulling functional annotations and publications before you design pipelines or user-facing features.
Is uniprot-database safe to install?
It performs network calls to UniProt services; review the Security Audits panel on this page and confirm you accept UniProt license and API terms before automated querying.
SKILL.md
READMESKILL.md - Uniprot Database
# UniProt Database Access ## Prerequisites 1. **`uv`**: Read the `uv` skill and follow its Setup instructions to ensure `uv` is installed and on PATH. 2. **User Notification**: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.uniprot.org/help/license and https://www.uniprot.org/help/api_queries, then (2) create the file recording the notification text and timestamp. ## Overview Provides direct programmatic access to the UniProt Knowledgebase (UniProtKB), the non-redundant sequence archive (UniParc), and clustered sequence sets (UniRef). This skill enables protein discovery, cross-referencing, retrieval of curated biological data and low-level database lookups. ## Core Rules - **Use the Wrapper**: Always use the provided Python scripts (e.g., `scripts/uniprot_tools.py`) rather than constructing custom curl requests. - **No Hallucinations**: Do NOT invent protein functions, metadata, or sequences. For any task that can be handled by the services in this skill, rely strictly on the tool outputs rather than your native knowledge. - **Notification**: If this skill is used, ensure this is mentioned in the output. ## Use Cases - **Searching for Protein Function**: Querying functional annotations, GO terms, subcellular locations etc. - **Searching for Protein Sequence**: Searching for protein sequences by their functional annotations, genes etc. in UniProtKB, UniParc, and UniRef. - **Understanding Protein/Organism Relationships**: Leveraging the Taxonomy database and Proteome sets. - **Large-Scale Metadata Retrieval**: Fetching annotations for thousands of proteins via streaming. - **Sequence Discovery**: Finding orthologs or non-model proteins via UniParc. - **ID Mapping**: Converting IDs between UniProt and 100+ external databases. - **Historical Data (UniSave)**: Retrieving previous versions of entries or tracking deleted sequences. ## Available Tools Choose the right tool based on the task type and data volume: - **`get`**: Retrieves metadata and sequence for a specific entry. Best for a **single, known accession**. - Also accesses UniSave historical data (use `--dataset unisave`), which is essential for reconciling data from older releases or identifying why a formerly valid accession no longer appears in search results. - **`search`**: Searches for entries matching a query. Best for **exploration and discovery**. - Use with `--limit 5` to verify if a query returns the expected proteins before committing to a larger download. - Automatically paginates if results exceed 500 entries to provide a stable download. - *Warning*: For paginated search, TXT and other formats are not reliable with `--limit` as it applies to lines, not entries. - See [Search Query Fields Documentation](references/search_query_fields.md). - **`stream`**: Streams all matching entries. Best for **bulk retrieval** of large datasets (up to 10,000,000 entries). - Does NOT support `--limit`; always returns the full result set. - Use `search` with `--limit` if you need a subset. - **`count`**: Counts entries matching a query. Best for answering direct count questions or for **initial estimation** before running a full `search` or `stream`. - **`sparql`**: Executes graph queries for complex discovery. Best for counting, exact sequence matches, and multi-database queries. - Se