Uniprot Database

Name: Uniprot Database
Author: google-deepmind

google-deepmind/science-skills

Look up curated protein sequences, functions, taxonomy, and cross-database IDs via UniProtKB, UniParc, and UniRef without hand-rolling fragile API calls.

Overview

UniProt Database is an agent skill for the Idea phase that retrieves protein metadata, sequences, and annotations from UniProt via scripted API access.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill uniprot-database

What is this skill?

UniProtKB, UniParc, and UniRef access through provided Python wrapper scripts
Protein search, identifier mapping, functional annotations, and publication-linked metadata
Explicit boundary: not for alignment, folding, or sequence similarity search
License notification workflow with timestamped LICENSE_NOTIFICATION.txt
Requires uv on PATH per bundled setup instructions

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 551 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need authoritative protein records and IDs for a prototype, but manual API calls risk wrong endpoints, invented annotations, or license blind spots.

Who is it for?

Builders and researchers agent-driving protein discovery, ID mapping, or functional annotation pulls during early bioinformatics exploration.

Skip if: Structural biology tasks needing alignment, folding, or dedicated sequence similarity search outside UniProt lookup scope.

When should I use this skill?

User searches proteins, maps identifiers, or retrieves UniProt functional annotations and sequences; do not use for alignment, folding, or similarity search.

What do I get? / Deliverables

You get script-driven UniProt lookups with mapped identifiers and curated annotations, plus documented license notification before repeated queries.

Script-executed UniProt query results (metadata, sequences, mappings)
License notification file when first-run terms prompt applies
Identifier cross-reference tables suitable for downstream pipelines

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

IdeaOpportunity & market research

Protein discovery and annotation retrieval sit in Idea research when you are exploring biological targets or validating data sources before building pipelines or apps. Research subphase fits literature-adjacent database lookups and identifier mapping—not production ETL or structural folding workflows.

Also useful

BuildIntegrations & version control

How it compares

Curated UniProt lookup integration—not a structural predictor or generic arbitrary REST code generator.

Common Questions / FAQ

Who is uniprot-database for?

Solo developers and researchers building bioinformatics agents or data tooling who need reliable UniProtKB, UniParc, and UniRef access with guardrails against invented biology.

When should I use uniprot-database?

Use it in Idea research when searching proteins, mapping accessions, or pulling functional annotations and publications before you design pipelines or user-facing features.

Is uniprot-database safe to install?

It performs network calls to UniProt services; review the Security Audits panel on this page and confirm you accept UniProt license and API terms before automated querying.

SKILL.md

READMESKILL.md - Uniprot Database

# UniProt Database Access

## Prerequisites

1.  **`uv`**: Read the `uv` skill and follow its Setup instructions to ensure
    `uv` is installed and on PATH.
2.  **User Notification**: If LICENSE_NOTIFICATION.txt does not already exist in
    this skill directory then (1) prominently notify the user to check the terms
    at https://www.uniprot.org/help/license and
    https://www.uniprot.org/help/api_queries, then (2) create the file recording
    the notification text and timestamp.

## Overview

Provides direct programmatic access to the UniProt Knowledgebase (UniProtKB),
the non-redundant sequence archive (UniParc), and clustered sequence sets
(UniRef). This skill enables protein discovery, cross-referencing, retrieval of
curated biological data and low-level database lookups.

## Core Rules

-   **Use the Wrapper**: Always use the provided Python scripts (e.g.,
    `scripts/uniprot_tools.py`) rather than constructing custom curl requests.
-   **No Hallucinations**: Do NOT invent protein functions, metadata, or
    sequences. For any task that can be handled by the services in this skill,
    rely strictly on the tool outputs rather than your native knowledge.
-   **Notification**: If this skill is used, ensure this is mentioned in the
    output.

## Use Cases

-   **Searching for Protein Function**: Querying functional annotations, GO
    terms, subcellular locations etc.
-   **Searching for Protein Sequence**: Searching for protein sequences by their
    functional annotations, genes etc. in UniProtKB, UniParc, and UniRef.
-   **Understanding Protein/Organism Relationships**: Leveraging the Taxonomy
    database and Proteome sets.
-   **Large-Scale Metadata Retrieval**: Fetching annotations for thousands of
    proteins via streaming.
-   **Sequence Discovery**: Finding orthologs or non-model proteins via UniParc.
-   **ID Mapping**: Converting IDs between UniProt and 100+ external databases.
-   **Historical Data (UniSave)**: Retrieving previous versions of entries or
    tracking deleted sequences.

## Available Tools

Choose the right tool based on the task type and data volume:

-   **`get`**: Retrieves metadata and sequence for a specific entry. Best for a
    **single, known accession**.
    -   Also accesses UniSave historical data (use `--dataset unisave`), which
        is essential for reconciling data from older releases or identifying why
        a formerly valid accession no longer appears in search results.
-   **`search`**: Searches for entries matching a query. Best for **exploration
    and discovery**.
    -   Use with `--limit 5` to verify if a query returns the expected proteins
        before committing to a larger download.
    -   Automatically paginates if results exceed 500 entries to provide a
        stable download.
    -   *Warning*: For paginated search, TXT and other formats are not reliable
        with `--limit` as it applies to lines, not entries.
    -   See
        [Search Query Fields Documentation](references/search_query_fields.md).
-   **`stream`**: Streams all matching entries. Best for **bulk retrieval** of
    large datasets (up to 10,000,000 entries).
    -   Does NOT support `--limit`; always returns the full result set.
    -   Use `search` with `--limit` if you need a subset.
-   **`count`**: Counts entries matching a query. Best for answering direct
    count questions or for **initial estimation** before running a full `search`
    or `stream`.
-   **`sparql`**: Executes graph queries for complex discovery. Best for
    counting, exact sequence matches, and multi-database queries.
    -   Se

What is this skill?

UniProtKB, UniParc, and UniRef access through provided Python wrapper scripts

Protein search, identifier mapping, functional annotations, and publication-linked metadata

Explicit boundary: not for alignment, folding, or sequence similarity search

License notification workflow with timestamped LICENSE_NOTIFICATION.txt

Requires uv on PATH per bundled setup instructions

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 551 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What do I get? / Deliverables

You get script-driven UniProt lookups with mapped identifiers and curated annotations, plus documented license notification before repeated queries.

Script-executed UniProt query results (metadata, sequences, mappings)

License notification file when first-run terms prompt applies

Identifier cross-reference tables suitable for downstream pipelines

Journey fit

Primary fit

IdeaOpportunity & market research

Also useful

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Uniprot Database

# UniProt Database Access

## Prerequisites

1.  **`uv`**: Read the `uv` skill and follow its Setup instructions to ensure
    `uv` is installed and on PATH.
2.  **User Notification**: If LICENSE_NOTIFICATION.txt does not already exist in
    this skill directory then (1) prominently notify the user to check the terms
    at https://www.uniprot.org/help/license and
    https://www.uniprot.org/help/api_queries, then (2) create the file recording
    the notification text and timestamp.

## Overview

Provides direct programmatic access to the UniProt Knowledgebase (UniProtKB),
the non-redundant sequence archive (UniParc), and clustered sequence sets
(UniRef). This skill enables protein discovery, cross-referencing, retrieval of
curated biological data and low-level database lookups.

## Core Rules

-   **Use the Wrapper**: Always use the provided Python scripts (e.g.,
    `scripts/uniprot_tools.py`) rather than constructing custom curl requests.
-   **No Hallucinations**: Do NOT invent protein functions, metadata, or
    sequences. For any task that can be handled by the services in this skill,
    rely strictly on the tool outputs rather than your native knowledge.
-   **Notification**: If this skill is used, ensure this is mentioned in the
    output.

## Use Cases

-   **Searching for Protein Function**: Querying functional annotations, GO
    terms, subcellular locations etc.
-   **Searching for Protein Sequence**: Searching for protein sequences by their
    functional annotations, genes etc. in UniProtKB, UniParc, and UniRef.
-   **Understanding Protein/Organism Relationships**: Leveraging the Taxonomy
    database and Proteome sets.
-   **Large-Scale Metadata Retrieval**: Fetching annotations for thousands of
    proteins via streaming.
-   **Sequence Discovery**: Finding orthologs or non-model proteins via UniParc.
-   **ID Mapping**: Converting IDs between UniProt and 100+ external databases.
-   **Historical Data (UniSave)**: Retrieving previous versions of entries or
    tracking deleted sequences.

## Available Tools

Choose the right tool based on the task type and data volume:

-   **`get`**: Retrieves metadata and sequence for a specific entry. Best for a
    **single, known accession**.
    -   Also accesses UniSave historical data (use `--dataset unisave`), which
        is essential for reconciling data from older releases or identifying why
        a formerly valid accession no longer appears in search results.
-   **`search`**: Searches for entries matching a query. Best for **exploration
    and discovery**.
    -   Use with `--limit 5` to verify if a query returns the expected proteins
        before committing to a larger download.
    -   Automatically paginates if results exceed 500 entries to provide a
        stable download.
    -   *Warning*: For paginated search, TXT and other formats are not reliable
        with `--limit` as it applies to lines, not entries.
    -   See
        [Search Query Fields Documentation](references/search_query_fields.md).
-   **`stream`**: Streams all matching entries. Best for **bulk retrieval** of
    large datasets (up to 10,000,000 entries).
    -   Does NOT support `--limit`; always returns the full result set.
    -   Use `search` with `--limit` if you need a subset.
-   **`count`**: Counts entries matching a query. Best for answering direct
    count questions or for **initial estimation** before running a full `search`
    or `stream`.
-   **`sparql`**: Executes graph queries for complex discovery. Best for
    counting, exact sequence matches, and multi-database queries.
    -   Se

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is uniprot-database for?

When should I use uniprot-database?

Is uniprot-database safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is uniprot-database for?

When should I use uniprot-database?

Is uniprot-database safe to install?

SKILL.md