
String Database
Wire STRING database enrichment and PPI statistics into agent-driven bioinformatics workflows without hand-crafting API calls.
Overview
string-database is an agent skill for the Build phase that runs STRING CLI enrichment, PPI enrichment, functional-term search, and functional-annotation jobs and writes TSV results for solo builders integrating proteomic
Install
npx skills add https://github.com/google-deepmind/science-skills --skill string-databaseWhat is this skill?
- Runs `enrichment` for GO, KEGG, Pfam, InterPro, and SMART with p_value and FDR in TSV output
- Runs `ppi-enrichment` to test whether a protein set’s interaction count exceeds proteome-wide expectation
- Runs `functional-terms` reverse lookup from term or disease text (e.g. Melanoma) to associated proteins
- Runs `functional-annotation` to pull full annotation tables for a supplied identifier list
- Documents four command families with `uv run scripts/string_cli.py` and species taxonomy IDs (e.g. 9606, 10090, 511145)
- Four documented STRING CLI command families: enrichment, ppi-enrichment, functional-terms, and functional-annotation
Adoption & trust: 541 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a protein or gene list and need trustworthy GO, KEGG, and PPI enrichment outputs without mistyping STRING API parameters or parsing responses by hand.
Who is it for?
Indie builders and small teams adding STRING-powered enrichment or PPI checks to Python/uv bioinformatics repos or agent-assisted research pipelines.
Skip if: Pure product-market or landing-page work with no molecular identifiers, or teams that forbid shell execution and external scientific API calls in the agent environment.
When should I use this skill?
Use when you have protein or gene identifiers (or a functional term string) and need STRING-backed enrichment, PPI statistics, or annotation tables via `scripts/string_cli.py`.
What do I get? / Deliverables
After the skill runs, you get standardized TSV files with enrichment statistics, PPI background comparisons, or annotation tables ready for notebooks, reports, or backend ingestion.
- TSV enrichment tables with category, term, p_value, and fdr
- TSV PPI enrichment summary with node and edge counts versus expected edges
- TSV protein lists or per-protein functional annotation exports
Recommended Skills
Journey fit
Canonical shelf is Build because the skill is a procedural integration layer (uv-run CLI) that connects identifiers and species IDs to STRING endpoints and writes TSV artifacts into a repo or pipeline. Integrations fits best: it is not generic PM or docs—it orchestrates external STRING commands (enrichment, PPI enrichment, functional terms, functional annotation) as repeatable agent steps.
How it compares
Use this skill package for scripted STRING enrichment steps—not as a hosted database MCP server or a general-purpose literature search skill.
Common Questions / FAQ
Who is string-database for?
It is for solo builders and researchers who use AI coding agents to integrate STRING functional and PPI enrichment into uv-based Python projects and reproducible analysis pipelines.
When should I use string-database?
Use it in Build integrations when you need GO/KEG/Pfam enrichment TSVs, PPI network significance tests, disease or GO term to protein lookups, or full functional annotations; also during Validate scoping when you must sanity-check a gene set before building features around it.
Is string-database safe to install?
Treat it like any third-party agent skill: review the Security Audits panel on this Prism page and restrict network, filesystem, and shell permissions to what your pipeline actually needs before running `uv run` against production data.
SKILL.md
READMESKILL.md - String Database
# Functional & PPI Enrichment Use these commands for determining Gene Ontology, KEGG pathway enrichment, and general Protein-Protein Interaction (PPI) statistical enrichment. ## Command: `enrichment` Identifies enriched functional terms (GO, KEGG, Pfam, InterPro, SMART) for a set of proteins. ```bash uv run scripts/string_cli.py enrichment \ --identifiers trpA trpB trpC trpE \ --species 511145 \ --output /tmp/enrichment.tsv ``` **Output fields:** `category`, `term`, `p_value`, `fdr` (False Discovery Rate), `description`. ## Command: `ppi-enrichment` Determines if a network has significantly more interactions than expected by chance, comparing it to the background proteome-wide distribution. ```bash uv run scripts/string_cli.py ppi-enrichment \ --identifiers Trp53 Mdm2 Cdkn1a Cdk2 Cdk4 Ccnd1 Rb1 E2f1 \ --species 10090 \ --output /tmp/ppi_enrichment.tsv ``` **Output fields:** `number_of_nodes`, `number_of_edges`, `expected_number_of_edges`, `p_value`. ## Command: `functional-terms` Searches for all proteins associated with a specific functional term or disease (e.g., "Melanoma" or "GO:0008543"). *Note: This API takes `--term_text` instead of `--identifiers`.* ```bash uv run scripts/string_cli.py functional-terms \ --term_text "Melanoma" \ --species 9606 \ --output /tmp/melanoma_proteins.tsv ``` ## Command: `functional-annotation` Retrieves all functional annotations (not just enriched ones) for the given proteins. ```bash uv run scripts/string_cli.py functional-annotation \ --identifiers CDC28 CLB1 CLB2 CLB3 CKS1 \ --species 4932 \ --output /tmp/annotations.tsv ``` # Interactions & Networks Use these commands to retrieve protein interaction networks, topologies, mediators, and homology scores. ## Command: `network` Retrieves interactions between the provided input proteins. If `--add_nodes` is provided, it extends the neighborhood. ```bash uv run scripts/string_cli.py network \ --identifiers Trp53 Mdm2 \ --species 10090 \ --add_nodes 10 \ --network_type physical \ --output /tmp/p53_neighborhood.tsv ``` * **Options:** * `--required_score` (0-1000 threshold, e.g. 400 for medium confidence) * `--network_type` (`functional` or `physical`) * `--add_nodes` (number of closely interacting proteins to add to the network). * **Output columns:** `score` (combined confidence), `escore` (experimental evidence), `dscore` (database), `nscore` (neighborhood), `fscore` (fusion), `pscore` (phylogenetic), `tscore` (textmining), `ascore` (coexpression). ## Command: `partners` Gets the top interaction partners against the entire database for the provided proteins. ```bash uv run scripts/string_cli.py partners \ --identifiers BRCA1 \ --species 9606 \ --limit 10 \ --output /tmp/partners.tsv ``` ## Command: `image` Generates a visual map of the network. Output can be a PNG or SVG. ```bash uv run scripts/string_cli.py image \ --identifiers Trp53 Mdm2 Atm Atr Chek2 Brca1 Cdkn1a \ --species 10090 \ --format highres_image \ --output /tmp/p53_pathway_network.png ``` ## Command: `homology` Gets Smith-Waterman homology (similarity) scores between the input proteins. ```bash uv run scripts/string_cli.py homology \ --identifiers CDK1 CDK2 \ --species 9606 \ --output /tmp/homology.tsv ``` ## Command: `homology-best` Gets best homology similarity hits between the input proteins and proteins in other specified species. **Note: Target species must be exact comma-separated taxon IDs with no spaces.** ```bash uv run scripts/string_cli.py homology-best \ --identifiers CDK1 \ --species 9606 \ --species_b 10090,7227 \ --output /tmp/best_homology.tsv ``` # Mapping Identifiers Before querying for networks or enrichments, it is highly recommended to map common protein names (e.g., "TP53", "CDK2") to STRING's internal identifiers. Using mapped identifiers guarantees much faster server responses. ## Command: `map` ```bash uv run scripts