
Interpro Database
Query InterPro protein domains, families, and GO-linked entries through the official EBI API with correct filters, pagination, and aggregations for bioinformatics research.
Overview
InterPro-database is an agent skill for the Idea phase that documents InterPro API query parameters so agents can fetch protein entries, domains, and families from the EBI InterPro service reliably.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill interpro-databaseWhat is this skill?
- Documents global `page_size` (max 200) and `page_size=1` bulk count patterns for aggregation without full page downloads
- Covers `/entry` filters: `type`, `integrated`, `go_term`, `annotation`, and context-dependent `group_by` aggregations
- Maps Member Database constraints (e.g. `integrated` fails when `source_db=interpro`) to avoid broken queries
- Aligns with InterPro7 Swagger (`interpro7-swagger.yml`) for reproducible agent-generated API calls
- Documents `page_size` default ~20 with maximum 200 per page
- References official InterPro7 Swagger at ebi.ac.uk/interpro/api/static_files/interpro7-swagger.yml
Adoption & trust: 542 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need curated protein family and domain data from InterPro but the API has many endpoint-specific filters, pagination limits, and aggregation rules that are easy to get wrong in ad-hoc agent code.
Who is it for?
Solo builders or indie researchers shipping scientific agents, notebooks, or small APIs that must call InterPro with Swagger-accurate filters and pagination.
Skip if: Teams that only need one-off manual lookups in the InterPro web UI or projects with no protein annotation or structural-biology context.
When should I use this skill?
The user needs InterPro API queries, protein entry/domain/family filters, GO terms, member database integration flags, or `fetch_interpro_data` / count helpers for scientific pipelines.
What do I get? / Deliverables
Your agent builds valid InterPro `query_params`, uses efficient count and page strategies, and returns structured API results you can feed into downstream analysis or integration code.
- Correctly formed InterPro API query parameter dictionaries
- Paginated or aggregated InterPro API responses suitable for further analysis
Recommended Skills
Journey fit
InterPro sits at the start of protein-function and domain research—before you commit to pipelines, models, or apps that depend on curated family and domain annotations. The skill is a parameter and endpoint reference for exploratory API queries (`fetch_interpro_data`, counts, `/entry` filters), which matches early research and literature-style discovery rather than shipping code.
How it compares
Use this procedural API reference instead of hallucinating InterPro endpoint parameters in generic chat.
Common Questions / FAQ
Who is interpro-database for?
It is for solo builders and researchers using Claude Code, Cursor, or Codex to automate InterPro searches, entry exploration, and count aggregations during bioinformatics work.
When should I use interpro-database?
Use it in the Idea phase while researching protein function and domains, and again in Build when wiring backend integrations that call the InterPro API with documented filters and `page_size` behavior.
Is interpro-database safe to install?
Treat it like any third-party agent skill: review the Security Audits panel on this Prism page and avoid sending secrets in query strings; the skill itself is documentation for public EBI API usage.
SKILL.md
READMESKILL.md - Interpro Database
# InterPro API Query Parameters Reference This document provides a comprehensive list of all query parameters available for the InterPro API endpoints, based on the official InterPro Swagger documentation (https://www.ebi.ac.uk/interpro/api/static_files/interpro7-swagger.yml) These parameters can be passed into the `query_params` dictionary in `fetch_interpro_data`. ## Global Parameters *Available on all endpoints.* * `page_size`: (`int`) Number of results per page (typically defaults to 20, max is 200). Use `page_size=1` with `get_interpro_count` for rapid bulk aggregations without downloading pages. -------------------------------------------------------------------------------- ## 1. `/entry` Parameters *For exploring protein entries (genes, domains, families, repeats).* ### General Filters * `type`: (`str`) Filter by entry type (e.g., `family`, `domain`, `active_site`, `binding_site`, `conserved_site`, `ptms`, `repeat`, `homologous_superfamily`). * `integrated`: (`str`) Comma-separated list of Member Databases (e.g., `pfam`, `smart`) to filter integrated status. *(Fails if source_db=interpro)* * `go_term`: (`str`) Filter by exact Gene Ontology term (e.g., `GO:0016301`). * `annotation`: (`str`) Filter by annotation type (`logo`, `alignment`, `hmm`). *(Works only when `source_db` is a member database).* * `group_by`: (`str`) Aggregation method. *Note: Valid values depend on the context!* - `/entry` (and `/entry/integrated`, `/entry/unintegrated`, `/entry/all`): `type`, `source_database`, `tax_id`, `go_terms`. - `/entry/interpro`: `type`, `tax_id`, `source_database`, `member_databases`, `go_terms`, `go_categories`. - `/entry/{sourceDB}`: `type`, `tax_id`, `source_database`, `go_terms`, `go_categories`. * `sort_by`: (`str`) Sort criteria (e.g., `accession`, `name`). * `interpro_status`: (`str`) Value `"interpro_status"` counts how many entries are integrated and how many are not. *(Fails unless sourceDB is a member Database)*. * `ida`: (`str`) Included architectures strings. * `extra_fields`: (`str`) Include additional data (e.g., `counters`, `entry_id`, `short_name`, `description`, `wikipedia`, `literature`, `hierarchy`, `cross_references`, `entry_date`, `is_featured`, `overlaps_with`). *(Only available for `/entry/{sourceDB}` and `/entry/{sourceDB}/{accession}`).* ### InterPro-Specific (`source_db="interpro"`) * `go_category`: (`str`) Filter by top-level GO (`biological_process`, `molecular_function`, `cellular_component`). * `signature_in`: (`str`) Filter to entries matching a given member database. * `latest_entries`: (`str`) Pass `"latest_entries"` to filter for entries modified in the most recent release. * `interactions`: (`str`) Pass `"interactions"` to limit to entries with known structural interactions. * `pathways`: (`str`) Pass `"pathways"` to filter for entries linked to pathway datasets. * `has_model`: (`str`) Pass `"has_model"` to filter for entries with structural models. ### Source-DB Specific * `subfamilies` / `subfamily`: (`str`) Filter specifically against Panther subfamilies. *(Fails unless `source_db="panther"`)*. * `model`: (`str`) Included models from `interpro` or `pfam`. ### IDA (Domain Architecture) Search *(Can ONLY be used on the root `/entry` endpoint. Invalidates aggregations).* * `ida_search`: (`str`) Comma-separated list of domain accessions (InterPro or Pfam) to find architectures containing them. * `ida_ignore`: (`str`) Architectures to ignore. *(Requires `ida_search`)*. * `ordered`: (`str`) Pass `"ordered"` to mandate domains appear sequentially. *(Requires `ida_search`)*. * `exact`: (`str`) Pass `"exact"` to mandate exact composition (no surplus domains). *(Requires `ida_search` and `ordered`)*. -------------------------------------------------------------------------------- ## 2. `/protein` Parame