Gtex Database

Name: Gtex Database
Author: google-deepmind

google-deepmind/science-skills

Query GTEx Portal API v2 from agent-driven scripts with rate-limited, paginated access for gene expression and tissue metadata in bioinformatics side projects or health-tech prototypes.

Overview

GTEx Database is an agent skill for the Build phase that fetches GTEx Portal API v2 data through a rate-limited, paginated Python CLI aligned with GTEx Terms of Use.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill gtex-database

What is this skill?

CLI wrapper for GTEx API V2 with sequential fetch aligned to Portal Terms of Use
Built-in pagination handling and QPS=1.0 rate limiting via scienceskillscommon HttpClient
Pinned dataset context: gtex_v10 with GENCODE v39 references
Optional tissue cache to avoid repeated tissue list API calls
Python 3.10+ script block with local scienceskillscommon path dependency
HttpClient default QPS: 1.0
Dataset ID: gtex_v10
GENCODE version: v39

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need programmatic GTEx gene and tissue data but do not want ad-hoc scraping that violates portal rate limits or botches pagination.

Who is it for?

Indie bioinformatics or health-data builders prototyping expression lookups against GTEx v10 in Python agents.

Skip if: Builders needing a managed warehouse, real-time clinical EMR data, or GTEx access without Python 3.10+ and network egress.

When should I use this skill?

Task requires fetching GTEx Portal API v2 data with compliant sequential/paginated HTTP access from Python.

What do I get? / Deliverables

You run a documented CLI path that returns paginated JSON from GTEx V2 with QPS throttling and optional tissue caching for repeatable pipelines.

JSON API responses from GTEx V2 endpoints
Reusable CLI fetch helpers with pagination

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Build is canonical because the skill is a CLI/API integration layer you wire into pipelines, notebooks, or backend jobs during product construction. Integrations reflects external GTEx V2 HTTP access, pagination, and shared http_client patterns rather than UI or pure ML modeling.

Also useful

IdeaOpportunity & market research

How it compares

Skill-packaged API client for GTEx—not a general SQL database skill or an MCP server exposing arbitrary tables.

Common Questions / FAQ

Who is gtex-database for?

Solo developers and small teams wiring GTEx Portal queries into research prototypes, CLIs, or backend fetch jobs via agent-assisted coding.

When should I use gtex-database?

During build-phase integration work when you need tissue lists, gene-centric GTEx V2 endpoints, or JSON extracts for notebooks and ETL—after you have defined the scientific question in idea or validate research.

Is gtex-database safe to install?

It performs outbound HTTPS to gtexportal.org; review the Security Audits panel on this page and GTEx Terms of Use before production cron jobs or bulk downloads.

SKILL.md

READMESKILL.md - Gtex Database

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""CLI wrapper for GTEx API V2.

Follows GTEx Portal Terms of Use by fetching sequentially and handling
pagination.
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import json
import sys
import urllib.parse

from science_skills.scienceskillscommon import http_client

BASE_URL = 'https://gtexportal.org/api/v2'
DATASET_ID = 'gtex_v10'
GENCODE_VERSION = 'v39'
CLIENT = http_client.HttpClient(BASE_URL, qps=1.0)

# Optional cache for tissues to avoid fetching repeatedly
TISSUE_CACHE = None


def _fetch(url, params=None):
  """Fetches URL, with optional query parameters, using HttpClient."""
  if params:
    # Filter out None values
    params = {k: v for k, v in params.items() if v is not None}
    query_string = urllib.parse.urlencode(params, doseq=True)
    full_url = f'{url}?{query_string}'
  else:
    full_url = url

  return CLIENT.fetch_json(full_url)


def _fetch_paginated(url, params=None):
  """Fetches all pages from a paginated endpoint."""
  if params is None:
    params = {}

  all_data = []
  page = 0
  while True:
    params['page'] = page
    response = _fetch(url, params)

    # Some endpoints don't use 'data' wrapping or 'paging_info'
    if 'paging_info' not in response:
      return response

    data = response.get('data', [])
    all_data.extend(data)

    paging = response.get('paging_info', {})
    total_pages = paging.get('numberOfPages', 0)

    if page >= total_pages - 1:
      break
    page += 1

  return all_data


def get_tissue_mapping():
  """Fetches the list of tissues and maps names to tissueSiteDetailId."""
  global TISSUE_CACHE
  if TISSUE_CACHE is not None:
    return TISSUE_CACHE

  url = f'{BASE_URL}/dataset/tissueSiteDetail'
  data = _fetch_paginated(url, {'datasetId': DATASET_ID})

  mapping = {}
  for t in data:
    id_ = t.get('tissueSiteDetailId')
    name = t.get('tissueSiteDetail')
    if id_ and name:
      mapping[id_.lower()] = id_
      mapping[name.lower()] = id_
      # Allow "Esophagus - Muscularis" instead of "Esophagus_Muscularis", etc.
      mapping[id_.replace('_', ' ').lower()] = id_
      mapping[name.replace('-', ' ').lower()] = id_

  TISSUE_CACHE = mapping
  return mapping


def resolve_tissue(tissue_str):
  mapping = get_tissue_mapping()
  cleaned = tissue_str.strip().lower()
  if cleaned in mapping:
    return mapping[cleaned]
  # Try more fuzzy matching if needed
  cleaned_no_hyphen = cleaned.replace('-', ' ')
  if cleaned_no_hyphen in mapping:
    return mapping[cleaned_no_hyphen]
  sys.stderr.write(f"Error: Unknown tissue '{tissue_str}'.\n")
  sys.exit(1)


def resolve_gencode_id(gene_symbol, output_file):
  """Maps a standard gene symbol to its Versioned GENCODE ID."""
  url = f'{BASE_URL}/reference/gene'
  data = _fetch_paginated(
      url, {'geneId': gene_symbol, 'gencodeVersion': GENCODE_VERSION}
  )
  if not data:
    sys.stderr.write(f"Error: Could not find GENCODE ID for '{gene_symbol}'.\n")
    sys.exit(1)

  # Return the first matching exact symbol if possible, else just the first one
  best_match = data[0]
  for d in data:
    if d.get('geneSymbol', '').lower() == gene_symbol.lower():
      best_match = d
      break

  result = {
      'gene_symbol': best_match.get('geneSymbol'),
      'gencode_id': best_match.get('gencodeId'),
      'chro

What is this skill?

CLI wrapper for GTEx API V2 with sequential fetch aligned to Portal Terms of Use

Built-in pagination handling and QPS=1.0 rate limiting via scienceskillscommon HttpClient

Pinned dataset context: gtex_v10 with GENCODE v39 references

Optional tissue cache to avoid repeated tissue list API calls

Python 3.10+ script block with local scienceskillscommon path dependency

HttpClient default QPS: 1.0

Dataset ID: gtex_v10

GENCODE version: v39

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Gtex Database

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""CLI wrapper for GTEx API V2.

Follows GTEx Portal Terms of Use by fetching sequentially and handling
pagination.
"""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import json
import sys
import urllib.parse

from science_skills.scienceskillscommon import http_client

BASE_URL = 'https://gtexportal.org/api/v2'
DATASET_ID = 'gtex_v10'
GENCODE_VERSION = 'v39'
CLIENT = http_client.HttpClient(BASE_URL, qps=1.0)

# Optional cache for tissues to avoid fetching repeatedly
TISSUE_CACHE = None


def _fetch(url, params=None):
  """Fetches URL, with optional query parameters, using HttpClient."""
  if params:
    # Filter out None values
    params = {k: v for k, v in params.items() if v is not None}
    query_string = urllib.parse.urlencode(params, doseq=True)
    full_url = f'{url}?{query_string}'
  else:
    full_url = url

  return CLIENT.fetch_json(full_url)


def _fetch_paginated(url, params=None):
  """Fetches all pages from a paginated endpoint."""
  if params is None:
    params = {}

  all_data = []
  page = 0
  while True:
    params['page'] = page
    response = _fetch(url, params)

    # Some endpoints don't use 'data' wrapping or 'paging_info'
    if 'paging_info' not in response:
      return response

    data = response.get('data', [])
    all_data.extend(data)

    paging = response.get('paging_info', {})
    total_pages = paging.get('numberOfPages', 0)

    if page >= total_pages - 1:
      break
    page += 1

  return all_data


def get_tissue_mapping():
  """Fetches the list of tissues and maps names to tissueSiteDetailId."""
  global TISSUE_CACHE
  if TISSUE_CACHE is not None:
    return TISSUE_CACHE

  url = f'{BASE_URL}/dataset/tissueSiteDetail'
  data = _fetch_paginated(url, {'datasetId': DATASET_ID})

  mapping = {}
  for t in data:
    id_ = t.get('tissueSiteDetailId')
    name = t.get('tissueSiteDetail')
    if id_ and name:
      mapping[id_.lower()] = id_
      mapping[name.lower()] = id_
      # Allow "Esophagus - Muscularis" instead of "Esophagus_Muscularis", etc.
      mapping[id_.replace('_', ' ').lower()] = id_
      mapping[name.replace('-', ' ').lower()] = id_

  TISSUE_CACHE = mapping
  return mapping


def resolve_tissue(tissue_str):
  mapping = get_tissue_mapping()
  cleaned = tissue_str.strip().lower()
  if cleaned in mapping:
    return mapping[cleaned]
  # Try more fuzzy matching if needed
  cleaned_no_hyphen = cleaned.replace('-', ' ')
  if cleaned_no_hyphen in mapping:
    return mapping[cleaned_no_hyphen]
  sys.stderr.write(f"Error: Unknown tissue '{tissue_str}'.\n")
  sys.exit(1)


def resolve_gencode_id(gene_symbol, output_file):
  """Maps a standard gene symbol to its Versioned GENCODE ID."""
  url = f'{BASE_URL}/reference/gene'
  data = _fetch_paginated(
      url, {'geneId': gene_symbol, 'gencodeVersion': GENCODE_VERSION}
  )
  if not data:
    sys.stderr.write(f"Error: Could not find GENCODE ID for '{gene_symbol}'.\n")
    sys.exit(1)

  # Return the first matching exact symbol if possible, else just the first one
  best_match = data[0]
  for d in data:
    if d.get('geneSymbol', '').lower() == gene_symbol.lower():
      best_match = d
      break

  result = {
      'gene_symbol': best_match.get('geneSymbol'),
      'gencode_id': best_match.get('gencodeId'),
      'chro

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gtex-database for?

When should I use gtex-database?

Is gtex-database safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is gtex-database for?

When should I use gtex-database?

Is gtex-database safe to install?

SKILL.md