
Ensembl Database
Query Ensembl stable IDs, sequences, and annotations via the REST API when building genomics tools or science automations.
Overview
Ensembl Database is an agent skill for the Build phase that documents how to call the Ensembl REST API for lookups, sequences, and annotations.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill ensembl-databaseWhat is this skill?
- Base URLs for GRCh38 and GRCh37 Ensembl REST hosts
- Content-Type negotiation for JSON, plain sequence, FASTA, and GFF3 responses
- 15 requests/second rate limit with Retry-After guidance on HTTP 429
- Region format CHR:START..END:STRAND and semicolon query parameter conventions
- Lookup endpoint patterns for genes, transcripts, proteins with expand and MANE options
- Documents a maximum rate limit of 15 requests per second on Ensembl REST
- Covers 2 assembly bases: GRCh38 (rest.ensembl.org) and GRCh37 (grch37.rest.ensembl.org)
Adoption & trust: 547 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need live Ensembl gene and sequence data but only have partial examples and keep hitting wrong hosts, formats, or rate limits.
Who is it for?
Solo builders shipping genomics features, lab tooling, or data pipelines that must integrate with Ensembl’s public REST service.
Skip if: Projects with no biological data needs or teams that already standardize on a private, fully wrapped Ensembl client with no custom queries.
When should I use this skill?
Building custom Ensembl REST queries beyond bundled script coverage.
What do I get? / Deliverables
Your agent produces valid Ensembl REST requests with correct bases, headers, region syntax, and throttling behavior for GRCh38 or GRCh37.
- Correct Ensembl REST URL and header patterns for the target endpoint
- Integration notes for rate limiting and response format choice
Recommended Skills
Journey fit
Ensembl access is implemented as HTTP integration work once you know what biological entities you need in product code. The skill is a REST contract reference (lookup, formats, rate limits) suited to wiring backends and scripts against rest.ensembl.org.
How it compares
API integration reference for Ensembl—not a general-purpose SQL database skill.
Common Questions / FAQ
Who is ensembl-database for?
Developers and scientist-builders wiring applications or scripts to Ensembl for gene IDs, transcripts, sequences, and related metadata.
When should I use ensembl-database?
During Build integrations when implementing lookup, sequence export, or annotation calls, especially when ensembl_api.py does not cover your endpoint or assembly.
Is ensembl-database safe to install?
The skill describes public API usage; review the Security Audits panel on this Prism page and avoid sending secrets in URLs or logs.
SKILL.md
READMESKILL.md - Ensembl Database
# Ensembl REST API Reference This document provides a concise reference for the Ensembl REST API (`https://rest.ensembl.org`). Use it to build custom queries when the `ensembl_api.py` script does not cover a specific use case. ## General Conventions - **Base URL:** `https://rest.ensembl.org` (GRCh38). For GRCh37: `https://grch37.rest.ensembl.org` - **Content Negotiation:** Set the `Content-Type` header to control the response format: - `application/json` — structured JSON (default for most endpoints) - `text/plain` — raw sequence string - `text/x-fasta` — FASTA-formatted sequence - `text/x-gff3` — GFF3 annotation output - **Rate Limit:** Max 15 requests/second. On HTTP 429, honour the `Retry-After` header. - **Region Format:** `CHR:START..END:STRAND` where STRAND is `1` (forward) or `-1` (reverse). A hyphen (`START-END`) also works for most endpoints. - **Query Parameter Separator:** Ensembl uses `;` (semicolon) to separate query parameters, e.g. `?expand=1;mane=1`. Standard `&` also works. --- ## Lookup Endpoints ### `GET /lookup/id/{id}` Look up any Ensembl stable ID (gene, transcript, protein) and retrieve metadata. - **`expand`** (0/1): Include child objects (Transcript array for genes, Exon array for transcripts, Translation for coding transcripts) - **`mane`** (0/1): Include MANE Select/Plus Clinical annotations on transcripts - **`db_type`** (string): Database (default: `core`). Options: `core`, `otherfeatures` - **`format`** (string): `full` (default) or `condensed` - **`species`** (string): Override species if the ID is ambiguous **Key response fields (Gene):** `id`, `display_name` (symbol), `biotype`, `description`, `seq_region_name` (chromosome), `start`, `end`, `strand`, `assembly_name`, `Transcript[]` (when expanded). **Key response fields (Transcript, expanded):** `id`, `biotype`, `display_name`, `is_canonical` (0 or 1), `length`, `MANE[]` (array with `type`: `MANE_Select` or `MANE_Plus_Clinical`), `TSL` (Transcript Support Level object with `value`), `Exon[]`, `Translation` (with `id`, `start`, `end`, `length`). ### `GET /lookup/symbol/{species}/{symbol}` Resolve a gene symbol to its Ensembl stable ID. - **`expand`** (0/1): Include child objects Returns the same structure as `/lookup/id/`. ### `POST /lookup/id` Batch lookup: send `{"ids": ["ENSG...", "ENST..."]}` as JSON body. Returns a dict keyed by ID. --- ## Cross-Reference Endpoints ### `GET /xrefs/id/{id}` Retrieve external database references for an Ensembl ID. - **`external_db`** (string): Filter by database name (e.g. `UniProt`, `HGNC`, `RefSeq_mRNA`, `UCSC`, `EntrezGene`) - **`all_levels`** (0/1): Include xrefs from parent/child features **Response:** Array of objects with `primary_id`, `display_id`, `db_display_name`, `dbname`, `description`, `info_type`. ### `GET /xrefs/symbol/{species}/{symbol}` Find Ensembl IDs matching an external symbol. ### `GET /xrefs/name/{species}/{name}` Broader search — looks up any name across all external databases. --- ## Sequence Endpoints ### `GET /sequence/id/{id}` Fetch sequence for an Ensembl feature by stable ID. - **`type`** (string): `genomic` (default), `cdna`, `cds`, `protein` - **`expand_5prime`** (int): Extend N bases upstream - **`expand_3prime`** (int): Extend N bases downstream - **`mask`** (string): Masking: `hard` or `soft` Set `Accept: text/x-fasta` for FASTA output, `text/plain` for raw string. ### `GET /sequence/region/{species}/{region}` Fetch genomic DNA for a coordinate window. - **`coord_system_version`** (string): Assembly version (e.g. `GRCh38`) - **`expand_5prime`** (int): Extend N bases upstream - **`expand_3prime`** (int): Extend N bases downstream - **`mask`** (string): `hard` or `soft` repeat masking - **`mask_feature`** (0/1): Apply feature-level masking Region format: `CHR:START..END:STRAND` (e.g., `X:1000000..1000100:1`). ### `POST /sequence/region/{species}` Batch: send `{"regions": ["X:1000..2000", "7:100..200"]}`. ---