Ensembl Database

Name: Ensembl Database
Author: google-deepmind

google-deepmind/science-skills

Query Ensembl stable IDs, sequences, and annotations via the REST API when building genomics tools or science automations.

Overview

Ensembl Database is an agent skill for the Build phase that documents how to call the Ensembl REST API for lookups, sequences, and annotations.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill ensembl-database

What is this skill?

Base URLs for GRCh38 and GRCh37 Ensembl REST hosts
Content-Type negotiation for JSON, plain sequence, FASTA, and GFF3 responses
15 requests/second rate limit with Retry-After guidance on HTTP 429
Region format CHR:START..END:STRAND and semicolon query parameter conventions
Lookup endpoint patterns for genes, transcripts, proteins with expand and MANE options
Documents a maximum rate limit of 15 requests per second on Ensembl REST
Covers 2 assembly bases: GRCh38 (rest.ensembl.org) and GRCh37 (grch37.rest.ensembl.org)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 547 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need live Ensembl gene and sequence data but only have partial examples and keep hitting wrong hosts, formats, or rate limits.

Who is it for?

Solo builders shipping genomics features, lab tooling, or data pipelines that must integrate with Ensembl’s public REST service.

Skip if: Projects with no biological data needs or teams that already standardize on a private, fully wrapped Ensembl client with no custom queries.

When should I use this skill?

Building custom Ensembl REST queries beyond bundled script coverage.

What do I get? / Deliverables

Your agent produces valid Ensembl REST requests with correct bases, headers, region syntax, and throttling behavior for GRCh38 or GRCh37.

Correct Ensembl REST URL and header patterns for the target endpoint
Integration notes for rate limiting and response format choice

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Ensembl access is implemented as HTTP integration work once you know what biological entities you need in product code. The skill is a REST contract reference (lookup, formats, rate limits) suited to wiring backends and scripts against rest.ensembl.org.

Also useful

IdeaOpportunity & market research

How it compares

API integration reference for Ensembl—not a general-purpose SQL database skill.

Common Questions / FAQ

Who is ensembl-database for?

Developers and scientist-builders wiring applications or scripts to Ensembl for gene IDs, transcripts, sequences, and related metadata.

When should I use ensembl-database?

During Build integrations when implementing lookup, sequence export, or annotation calls, especially when ensembl_api.py does not cover your endpoint or assembly.

Is ensembl-database safe to install?

The skill describes public API usage; review the Security Audits panel on this Prism page and avoid sending secrets in URLs or logs.

SKILL.md

READMESKILL.md - Ensembl Database

# Ensembl REST API Reference

This document provides a concise reference for the Ensembl REST API
(`https://rest.ensembl.org`). Use it to build custom queries when the
`ensembl_api.py` script does not cover a specific use case.

## General Conventions

- **Base URL:** `https://rest.ensembl.org` (GRCh38). For GRCh37:
  `https://grch37.rest.ensembl.org`
- **Content Negotiation:** Set the `Content-Type` header to control the
  response format:
    - `application/json` — structured JSON (default for most endpoints)
    - `text/plain` — raw sequence string
    - `text/x-fasta` — FASTA-formatted sequence
    - `text/x-gff3` — GFF3 annotation output
- **Rate Limit:** Max 15 requests/second. On HTTP 429, honour the
  `Retry-After` header.
- **Region Format:** `CHR:START..END:STRAND` where STRAND is `1` (forward)
  or `-1` (reverse). A hyphen (`START-END`) also works for most endpoints.
- **Query Parameter Separator:** Ensembl uses `;` (semicolon) to separate
  query parameters, e.g. `?expand=1;mane=1`. Standard `&` also works.

---

## Lookup Endpoints

### `GET /lookup/id/{id}`
Look up any Ensembl stable ID (gene, transcript, protein) and retrieve
metadata.

- **`expand`** (0/1): Include child objects (Transcript array for genes, Exon
     array for transcripts, Translation for coding transcripts)
- **`mane`** (0/1): Include MANE Select/Plus Clinical annotations on transcripts
- **`db_type`** (string): Database (default: `core`). Options: `core`,
    `otherfeatures`
- **`format`** (string): `full` (default) or `condensed`
- **`species`** (string): Override species if the ID is ambiguous

**Key response fields (Gene):**
`id`, `display_name` (symbol), `biotype`, `description`,
`seq_region_name` (chromosome), `start`, `end`, `strand`,
`assembly_name`, `Transcript[]` (when expanded).

**Key response fields (Transcript, expanded):**
`id`, `biotype`, `display_name`, `is_canonical` (0 or 1), `length`,
`MANE[]` (array with `type`: `MANE_Select` or `MANE_Plus_Clinical`),
`TSL` (Transcript Support Level object with `value`),
`Exon[]`, `Translation` (with `id`, `start`, `end`, `length`).

### `GET /lookup/symbol/{species}/{symbol}`
Resolve a gene symbol to its Ensembl stable ID.

- **`expand`** (0/1): Include child objects

Returns the same structure as `/lookup/id/`.

### `POST /lookup/id`
Batch lookup: send `{"ids": ["ENSG...", "ENST..."]}` as JSON body.
Returns a dict keyed by ID.

---

## Cross-Reference Endpoints

### `GET /xrefs/id/{id}`
Retrieve external database references for an Ensembl ID.

- **`external_db`** (string): Filter by database name (e.g. `UniProt`, `HGNC`,
    `RefSeq_mRNA`, `UCSC`, `EntrezGene`)
- **`all_levels`** (0/1): Include xrefs from parent/child features

**Response:** Array of objects with `primary_id`, `display_id`,
`db_display_name`, `dbname`, `description`, `info_type`.

### `GET /xrefs/symbol/{species}/{symbol}`
Find Ensembl IDs matching an external symbol.

### `GET /xrefs/name/{species}/{name}`
Broader search — looks up any name across all external databases.

---

## Sequence Endpoints

### `GET /sequence/id/{id}`
Fetch sequence for an Ensembl feature by stable ID.

- **`type`** (string): `genomic` (default), `cdna`, `cds`, `protein`
- **`expand_5prime`** (int): Extend N bases upstream
- **`expand_3prime`** (int): Extend N bases downstream
- **`mask`** (string): Masking: `hard` or `soft`

Set `Accept: text/x-fasta` for FASTA output, `text/plain` for raw string.

### `GET /sequence/region/{species}/{region}`
Fetch genomic DNA for a coordinate window.

- **`coord_system_version`** (string): Assembly version (e.g. `GRCh38`)
- **`expand_5prime`** (int): Extend N bases upstream
- **`expand_3prime`** (int): Extend N bases downstream
- **`mask`** (string): `hard` or `soft` repeat masking
- **`mask_feature`** (0/1): Apply feature-level masking

Region format: `CHR:START..END:STRAND` (e.g., `X:1000000..1000100:1`).

### `POST /sequence/region/{species}`
Batch: send `{"regions": ["X:1000..2000", "7:100..200"]}`.

---

What is this skill?

Base URLs for GRCh38 and GRCh37 Ensembl REST hosts

Content-Type negotiation for JSON, plain sequence, FASTA, and GFF3 responses

15 requests/second rate limit with Retry-After guidance on HTTP 429

Region format CHR:START..END:STRAND and semicolon query parameter conventions

Lookup endpoint patterns for genes, transcripts, proteins with expand and MANE options

Documents a maximum rate limit of 15 requests per second on Ensembl REST

Covers 2 assembly bases: GRCh38 (rest.ensembl.org) and GRCh37 (grch37.rest.ensembl.org)

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 547 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Ensembl Database

# Ensembl REST API Reference

This document provides a concise reference for the Ensembl REST API
(`https://rest.ensembl.org`). Use it to build custom queries when the
`ensembl_api.py` script does not cover a specific use case.

## General Conventions

- **Base URL:** `https://rest.ensembl.org` (GRCh38). For GRCh37:
  `https://grch37.rest.ensembl.org`
- **Content Negotiation:** Set the `Content-Type` header to control the
  response format:
    - `application/json` — structured JSON (default for most endpoints)
    - `text/plain` — raw sequence string
    - `text/x-fasta` — FASTA-formatted sequence
    - `text/x-gff3` — GFF3 annotation output
- **Rate Limit:** Max 15 requests/second. On HTTP 429, honour the
  `Retry-After` header.
- **Region Format:** `CHR:START..END:STRAND` where STRAND is `1` (forward)
  or `-1` (reverse). A hyphen (`START-END`) also works for most endpoints.
- **Query Parameter Separator:** Ensembl uses `;` (semicolon) to separate
  query parameters, e.g. `?expand=1;mane=1`. Standard `&` also works.

---

## Lookup Endpoints

### `GET /lookup/id/{id}`
Look up any Ensembl stable ID (gene, transcript, protein) and retrieve
metadata.

- **`expand`** (0/1): Include child objects (Transcript array for genes, Exon
     array for transcripts, Translation for coding transcripts)
- **`mane`** (0/1): Include MANE Select/Plus Clinical annotations on transcripts
- **`db_type`** (string): Database (default: `core`). Options: `core`,
    `otherfeatures`
- **`format`** (string): `full` (default) or `condensed`
- **`species`** (string): Override species if the ID is ambiguous

**Key response fields (Gene):**
`id`, `display_name` (symbol), `biotype`, `description`,
`seq_region_name` (chromosome), `start`, `end`, `strand`,
`assembly_name`, `Transcript[]` (when expanded).

**Key response fields (Transcript, expanded):**
`id`, `biotype`, `display_name`, `is_canonical` (0 or 1), `length`,
`MANE[]` (array with `type`: `MANE_Select` or `MANE_Plus_Clinical`),
`TSL` (Transcript Support Level object with `value`),
`Exon[]`, `Translation` (with `id`, `start`, `end`, `length`).

### `GET /lookup/symbol/{species}/{symbol}`
Resolve a gene symbol to its Ensembl stable ID.

- **`expand`** (0/1): Include child objects

Returns the same structure as `/lookup/id/`.

### `POST /lookup/id`
Batch lookup: send `{"ids": ["ENSG...", "ENST..."]}` as JSON body.
Returns a dict keyed by ID.

---

## Cross-Reference Endpoints

### `GET /xrefs/id/{id}`
Retrieve external database references for an Ensembl ID.

- **`external_db`** (string): Filter by database name (e.g. `UniProt`, `HGNC`,
    `RefSeq_mRNA`, `UCSC`, `EntrezGene`)
- **`all_levels`** (0/1): Include xrefs from parent/child features

**Response:** Array of objects with `primary_id`, `display_id`,
`db_display_name`, `dbname`, `description`, `info_type`.

### `GET /xrefs/symbol/{species}/{symbol}`
Find Ensembl IDs matching an external symbol.

### `GET /xrefs/name/{species}/{name}`
Broader search — looks up any name across all external databases.

---

## Sequence Endpoints

### `GET /sequence/id/{id}`
Fetch sequence for an Ensembl feature by stable ID.

- **`type`** (string): `genomic` (default), `cdna`, `cds`, `protein`
- **`expand_5prime`** (int): Extend N bases upstream
- **`expand_3prime`** (int): Extend N bases downstream
- **`mask`** (string): Masking: `hard` or `soft`

Set `Accept: text/x-fasta` for FASTA output, `text/plain` for raw string.

### `GET /sequence/region/{species}/{region}`
Fetch genomic DNA for a coordinate window.

- **`coord_system_version`** (string): Assembly version (e.g. `GRCh38`)
- **`expand_5prime`** (int): Extend N bases upstream
- **`expand_3prime`** (int): Extend N bases downstream
- **`mask`** (string): `hard` or `soft` repeat masking
- **`mask_feature`** (0/1): Apply feature-level masking

Region format: `CHR:START..END:STRAND` (e.g., `X:1000000..1000100:1`).

### `POST /sequence/region/{species}`
Batch: send `{"regions": ["X:1000..2000", "7:100..200"]}`.

---

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ensembl-database for?

When should I use ensembl-database?

Is ensembl-database safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is ensembl-database for?

When should I use ensembl-database?

Is ensembl-database safe to install?

SKILL.md