String Database

Name: String Database
Author: google-deepmind

google-deepmind/science-skills

Wire STRING database enrichment and PPI statistics into agent-driven bioinformatics workflows without hand-crafting API calls.

Overview

string-database is an agent skill for the Build phase that runs STRING CLI enrichment, PPI enrichment, functional-term search, and functional-annotation jobs and writes TSV results for solo builders integrating proteomic

Install

npx skills add https://github.com/google-deepmind/science-skills --skill string-database

What is this skill?

Runs `enrichment` for GO, KEGG, Pfam, InterPro, and SMART with p_value and FDR in TSV output
Runs `ppi-enrichment` to test whether a protein set’s interaction count exceeds proteome-wide expectation
Runs `functional-terms` reverse lookup from term or disease text (e.g. Melanoma) to associated proteins
Runs `functional-annotation` to pull full annotation tables for a supplied identifier list
Documents four command families with `uv run scripts/string_cli.py` and species taxonomy IDs (e.g. 9606, 10090, 511145)
Four documented STRING CLI command families: enrichment, ppi-enrichment, functional-terms, and functional-annotation

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 541 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You have a protein or gene list and need trustworthy GO, KEGG, and PPI enrichment outputs without mistyping STRING API parameters or parsing responses by hand.

Who is it for?

Indie builders and small teams adding STRING-powered enrichment or PPI checks to Python/uv bioinformatics repos or agent-assisted research pipelines.

Skip if: Pure product-market or landing-page work with no molecular identifiers, or teams that forbid shell execution and external scientific API calls in the agent environment.

When should I use this skill?

Use when you have protein or gene identifiers (or a functional term string) and need STRING-backed enrichment, PPI statistics, or annotation tables via `scripts/string_cli.py`.

What do I get? / Deliverables

After the skill runs, you get standardized TSV files with enrichment statistics, PPI background comparisons, or annotation tables ready for notebooks, reports, or backend ingestion.

TSV enrichment tables with category, term, p_value, and fdr
TSV PPI enrichment summary with node and edge counts versus expected edges
TSV protein lists or per-protein functional annotation exports

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Canonical shelf is Build because the skill is a procedural integration layer (uv-run CLI) that connects identifiers and species IDs to STRING endpoints and writes TSV artifacts into a repo or pipeline. Integrations fits best: it is not generic PM or docs—it orchestrates external STRING commands (enrichment, PPI enrichment, functional terms, functional annotation) as repeatable agent steps.

Also useful

ValidateScope & plan

Also useful

IdeaOpportunity & market research

How it compares

Use this skill package for scripted STRING enrichment steps—not as a hosted database MCP server or a general-purpose literature search skill.

Common Questions / FAQ

Who is string-database for?

It is for solo builders and researchers who use AI coding agents to integrate STRING functional and PPI enrichment into uv-based Python projects and reproducible analysis pipelines.

When should I use string-database?

Use it in Build integrations when you need GO/KEG/Pfam enrichment TSVs, PPI network significance tests, disease or GO term to protein lookups, or full functional annotations; also during Validate scoping when you must sanity-check a gene set before building features around it.

Is string-database safe to install?

Treat it like any third-party agent skill: review the Security Audits panel on this Prism page and restrict network, filesystem, and shell permissions to what your pipeline actually needs before running `uv run` against production data.

SKILL.md

READMESKILL.md - String Database

# Functional & PPI Enrichment

Use these commands for determining Gene Ontology, KEGG pathway enrichment, and
general Protein-Protein Interaction (PPI) statistical enrichment.

## Command: `enrichment`

Identifies enriched functional terms (GO, KEGG, Pfam, InterPro, SMART) for a set
of proteins.

```bash
uv run scripts/string_cli.py enrichment \
  --identifiers trpA trpB trpC trpE \
  --species 511145 \
  --output /tmp/enrichment.tsv
```

**Output fields:** `category`, `term`, `p_value`, `fdr` (False Discovery Rate),
`description`.

## Command: `ppi-enrichment`

Determines if a network has significantly more interactions than expected by
chance, comparing it to the background proteome-wide distribution.

```bash
uv run scripts/string_cli.py ppi-enrichment \
  --identifiers Trp53 Mdm2 Cdkn1a Cdk2 Cdk4 Ccnd1 Rb1 E2f1 \
  --species 10090 \
  --output /tmp/ppi_enrichment.tsv
```

**Output fields:** `number_of_nodes`, `number_of_edges`,
`expected_number_of_edges`, `p_value`.

## Command: `functional-terms`

Searches for all proteins associated with a specific functional term or disease
(e.g., "Melanoma" or "GO:0008543"). *Note: This API takes `--term_text` instead
of `--identifiers`.*

```bash
uv run scripts/string_cli.py functional-terms \
  --term_text "Melanoma" \
  --species 9606 \
  --output /tmp/melanoma_proteins.tsv
```

## Command: `functional-annotation`

Retrieves all functional annotations (not just enriched ones) for the given
proteins.

```bash
uv run scripts/string_cli.py functional-annotation \
  --identifiers CDC28 CLB1 CLB2 CLB3 CKS1 \
  --species 4932 \
  --output /tmp/annotations.tsv
```


# Interactions & Networks

Use these commands to retrieve protein interaction networks, topologies,
mediators, and homology scores.

## Command: `network`

Retrieves interactions between the provided input proteins. If `--add_nodes` is
provided, it extends the neighborhood.

```bash
uv run scripts/string_cli.py network \
  --identifiers Trp53 Mdm2 \
  --species 10090 \
  --add_nodes 10 \
  --network_type physical \
  --output /tmp/p53_neighborhood.tsv
```

*   **Options:**
    *   `--required_score` (0-1000 threshold, e.g. 400 for medium confidence)
    *   `--network_type` (`functional` or `physical`)
    *   `--add_nodes` (number of closely interacting proteins to add to the
        network).
*   **Output columns:** `score` (combined confidence), `escore` (experimental
    evidence), `dscore` (database), `nscore` (neighborhood), `fscore` (fusion),
    `pscore` (phylogenetic), `tscore` (textmining), `ascore` (coexpression).

## Command: `partners`

Gets the top interaction partners against the entire database for the provided
proteins.

```bash
uv run scripts/string_cli.py partners \
  --identifiers BRCA1 \
  --species 9606 \
  --limit 10 \
  --output /tmp/partners.tsv
```

## Command: `image`

Generates a visual map of the network. Output can be a PNG or SVG.

```bash
uv run scripts/string_cli.py image \
  --identifiers Trp53 Mdm2 Atm Atr Chek2 Brca1 Cdkn1a \
  --species 10090 \
  --format highres_image \
  --output /tmp/p53_pathway_network.png
```

## Command: `homology`

Gets Smith-Waterman homology (similarity) scores between the input proteins.

```bash
uv run scripts/string_cli.py homology \
  --identifiers CDK1 CDK2 \
  --species 9606 \
  --output /tmp/homology.tsv
```

## Command: `homology-best`

Gets best homology similarity hits between the input proteins and proteins in
other specified species. **Note: Target species must be exact comma-separated
taxon IDs with no spaces.**

```bash
uv run scripts/string_cli.py homology-best \
  --identifiers CDK1 \
  --species 9606 \
  --species_b 10090,7227 \
  --output /tmp/best_homology.tsv
```


# Mapping Identifiers

Before querying for networks or enrichments, it is highly recommended to map
common protein names (e.g., "TP53", "CDK2") to STRING's internal identifiers.
Using mapped identifiers guarantees much faster server responses.

## Command: `map`

```bash
uv run scripts

What is this skill?

Runs `enrichment` for GO, KEGG, Pfam, InterPro, and SMART with p_value and FDR in TSV output

Runs `ppi-enrichment` to test whether a protein set’s interaction count exceeds proteome-wide expectation

Runs `functional-terms` reverse lookup from term or disease text (e.g. Melanoma) to associated proteins

Runs `functional-annotation` to pull full annotation tables for a supplied identifier list

Documents four command families with `uv run scripts/string_cli.py` and species taxonomy IDs (e.g. 9606, 10090, 511145)

Four documented STRING CLI command families: enrichment, ppi-enrichment, functional-terms, and functional-annotation

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 541 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Who is it for?

Indie builders and small teams adding STRING-powered enrichment or PPI checks to Python/uv bioinformatics repos or agent-assisted research pipelines.

Skip if: Pure product-market or landing-page work with no molecular identifiers, or teams that forbid shell execution and external scientific API calls in the agent environment.

What do I get? / Deliverables

After the skill runs, you get standardized TSV files with enrichment statistics, PPI background comparisons, or annotation tables ready for notebooks, reports, or backend ingestion.

TSV enrichment tables with category, term, p_value, and fdr

TSV PPI enrichment summary with node and edge counts versus expected edges

TSV protein lists or per-protein functional annotation exports

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

ValidateScope & plan

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - String Database

# Functional & PPI Enrichment

Use these commands for determining Gene Ontology, KEGG pathway enrichment, and
general Protein-Protein Interaction (PPI) statistical enrichment.

## Command: `enrichment`

Identifies enriched functional terms (GO, KEGG, Pfam, InterPro, SMART) for a set
of proteins.

```bash
uv run scripts/string_cli.py enrichment \
  --identifiers trpA trpB trpC trpE \
  --species 511145 \
  --output /tmp/enrichment.tsv
```

**Output fields:** `category`, `term`, `p_value`, `fdr` (False Discovery Rate),
`description`.

## Command: `ppi-enrichment`

Determines if a network has significantly more interactions than expected by
chance, comparing it to the background proteome-wide distribution.

```bash
uv run scripts/string_cli.py ppi-enrichment \
  --identifiers Trp53 Mdm2 Cdkn1a Cdk2 Cdk4 Ccnd1 Rb1 E2f1 \
  --species 10090 \
  --output /tmp/ppi_enrichment.tsv
```

**Output fields:** `number_of_nodes`, `number_of_edges`,
`expected_number_of_edges`, `p_value`.

## Command: `functional-terms`

Searches for all proteins associated with a specific functional term or disease
(e.g., "Melanoma" or "GO:0008543"). *Note: This API takes `--term_text` instead
of `--identifiers`.*

```bash
uv run scripts/string_cli.py functional-terms \
  --term_text "Melanoma" \
  --species 9606 \
  --output /tmp/melanoma_proteins.tsv
```

## Command: `functional-annotation`

Retrieves all functional annotations (not just enriched ones) for the given
proteins.

```bash
uv run scripts/string_cli.py functional-annotation \
  --identifiers CDC28 CLB1 CLB2 CLB3 CKS1 \
  --species 4932 \
  --output /tmp/annotations.tsv
```


# Interactions & Networks

Use these commands to retrieve protein interaction networks, topologies,
mediators, and homology scores.

## Command: `network`

Retrieves interactions between the provided input proteins. If `--add_nodes` is
provided, it extends the neighborhood.

```bash
uv run scripts/string_cli.py network \
  --identifiers Trp53 Mdm2 \
  --species 10090 \
  --add_nodes 10 \
  --network_type physical \
  --output /tmp/p53_neighborhood.tsv
```

*   **Options:**
    *   `--required_score` (0-1000 threshold, e.g. 400 for medium confidence)
    *   `--network_type` (`functional` or `physical`)
    *   `--add_nodes` (number of closely interacting proteins to add to the
        network).
*   **Output columns:** `score` (combined confidence), `escore` (experimental
    evidence), `dscore` (database), `nscore` (neighborhood), `fscore` (fusion),
    `pscore` (phylogenetic), `tscore` (textmining), `ascore` (coexpression).

## Command: `partners`

Gets the top interaction partners against the entire database for the provided
proteins.

```bash
uv run scripts/string_cli.py partners \
  --identifiers BRCA1 \
  --species 9606 \
  --limit 10 \
  --output /tmp/partners.tsv
```

## Command: `image`

Generates a visual map of the network. Output can be a PNG or SVG.

```bash
uv run scripts/string_cli.py image \
  --identifiers Trp53 Mdm2 Atm Atr Chek2 Brca1 Cdkn1a \
  --species 10090 \
  --format highres_image \
  --output /tmp/p53_pathway_network.png
```

## Command: `homology`

Gets Smith-Waterman homology (similarity) scores between the input proteins.

```bash
uv run scripts/string_cli.py homology \
  --identifiers CDK1 CDK2 \
  --species 9606 \
  --output /tmp/homology.tsv
```

## Command: `homology-best`

Gets best homology similarity hits between the input proteins and proteins in
other specified species. **Note: Target species must be exact comma-separated
taxon IDs with no spaces.**

```bash
uv run scripts/string_cli.py homology-best \
  --identifiers CDK1 \
  --species 9606 \
  --species_b 10090,7227 \
  --output /tmp/best_homology.tsv
```


# Mapping Identifiers

Before querying for networks or enrichments, it is highly recommended to map
common protein names (e.g., "TP53", "CDK2") to STRING's internal identifiers.
Using mapped identifiers guarantees much faster server responses.

## Command: `map`

```bash
uv run scripts

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is string-database for?

When should I use string-database?

Is string-database safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is string-database for?

When should I use string-database?

Is string-database safe to install?

SKILL.md