
Gtex Database
Query GTEx Portal API v2 from agent-driven scripts with rate-limited, paginated access for gene expression and tissue metadata in bioinformatics side projects or health-tech prototypes.
Overview
GTEx Database is an agent skill for the Build phase that fetches GTEx Portal API v2 data through a rate-limited, paginated Python CLI aligned with GTEx Terms of Use.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill gtex-databaseWhat is this skill?
- CLI wrapper for GTEx API V2 with sequential fetch aligned to Portal Terms of Use
- Built-in pagination handling and QPS=1.0 rate limiting via scienceskillscommon HttpClient
- Pinned dataset context: gtex_v10 with GENCODE v39 references
- Optional tissue cache to avoid repeated tissue list API calls
- Python 3.10+ script block with local scienceskillscommon path dependency
- HttpClient default QPS: 1.0
- Dataset ID: gtex_v10
- GENCODE version: v39
Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need programmatic GTEx gene and tissue data but do not want ad-hoc scraping that violates portal rate limits or botches pagination.
Who is it for?
Indie bioinformatics or health-data builders prototyping expression lookups against GTEx v10 in Python agents.
Skip if: Builders needing a managed warehouse, real-time clinical EMR data, or GTEx access without Python 3.10+ and network egress.
When should I use this skill?
Task requires fetching GTEx Portal API v2 data with compliant sequential/paginated HTTP access from Python.
What do I get? / Deliverables
You run a documented CLI path that returns paginated JSON from GTEx V2 with QPS throttling and optional tissue caching for repeatable pipelines.
- JSON API responses from GTEx V2 endpoints
- Reusable CLI fetch helpers with pagination
Recommended Skills
Journey fit
Build is canonical because the skill is a CLI/API integration layer you wire into pipelines, notebooks, or backend jobs during product construction. Integrations reflects external GTEx V2 HTTP access, pagination, and shared http_client patterns rather than UI or pure ML modeling.
How it compares
Skill-packaged API client for GTEx—not a general SQL database skill or an MCP server exposing arbitrary tables.
Common Questions / FAQ
Who is gtex-database for?
Solo developers and small teams wiring GTEx Portal queries into research prototypes, CLIs, or backend fetch jobs via agent-assisted coding.
When should I use gtex-database?
During build-phase integration work when you need tissue lists, gene-centric GTEx V2 endpoints, or JSON extracts for notebooks and ETL—after you have defined the scientific question in idea or validate research.
Is gtex-database safe to install?
It performs outbound HTTPS to gtexportal.org; review the Security Audits panel on this page and GTEx Terms of Use before production cron jobs or bulk downloads.
SKILL.md
READMESKILL.md - Gtex Database
# Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """CLI wrapper for GTEx API V2. Follows GTEx Portal Terms of Use by fetching sequentially and handling pagination. """ # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// import argparse import json import sys import urllib.parse from science_skills.scienceskillscommon import http_client BASE_URL = 'https://gtexportal.org/api/v2' DATASET_ID = 'gtex_v10' GENCODE_VERSION = 'v39' CLIENT = http_client.HttpClient(BASE_URL, qps=1.0) # Optional cache for tissues to avoid fetching repeatedly TISSUE_CACHE = None def _fetch(url, params=None): """Fetches URL, with optional query parameters, using HttpClient.""" if params: # Filter out None values params = {k: v for k, v in params.items() if v is not None} query_string = urllib.parse.urlencode(params, doseq=True) full_url = f'{url}?{query_string}' else: full_url = url return CLIENT.fetch_json(full_url) def _fetch_paginated(url, params=None): """Fetches all pages from a paginated endpoint.""" if params is None: params = {} all_data = [] page = 0 while True: params['page'] = page response = _fetch(url, params) # Some endpoints don't use 'data' wrapping or 'paging_info' if 'paging_info' not in response: return response data = response.get('data', []) all_data.extend(data) paging = response.get('paging_info', {}) total_pages = paging.get('numberOfPages', 0) if page >= total_pages - 1: break page += 1 return all_data def get_tissue_mapping(): """Fetches the list of tissues and maps names to tissueSiteDetailId.""" global TISSUE_CACHE if TISSUE_CACHE is not None: return TISSUE_CACHE url = f'{BASE_URL}/dataset/tissueSiteDetail' data = _fetch_paginated(url, {'datasetId': DATASET_ID}) mapping = {} for t in data: id_ = t.get('tissueSiteDetailId') name = t.get('tissueSiteDetail') if id_ and name: mapping[id_.lower()] = id_ mapping[name.lower()] = id_ # Allow "Esophagus - Muscularis" instead of "Esophagus_Muscularis", etc. mapping[id_.replace('_', ' ').lower()] = id_ mapping[name.replace('-', ' ').lower()] = id_ TISSUE_CACHE = mapping return mapping def resolve_tissue(tissue_str): mapping = get_tissue_mapping() cleaned = tissue_str.strip().lower() if cleaned in mapping: return mapping[cleaned] # Try more fuzzy matching if needed cleaned_no_hyphen = cleaned.replace('-', ' ') if cleaned_no_hyphen in mapping: return mapping[cleaned_no_hyphen] sys.stderr.write(f"Error: Unknown tissue '{tissue_str}'.\n") sys.exit(1) def resolve_gencode_id(gene_symbol, output_file): """Maps a standard gene symbol to its Versioned GENCODE ID.""" url = f'{BASE_URL}/reference/gene' data = _fetch_paginated( url, {'geneId': gene_symbol, 'gencodeVersion': GENCODE_VERSION} ) if not data: sys.stderr.write(f"Error: Could not find GENCODE ID for '{gene_symbol}'.\n") sys.exit(1) # Return the first matching exact symbol if possible, else just the first one best_match = data[0] for d in data: if d.get('geneSymbol', '').lower() == gene_symbol.lower(): best_match = d break result = { 'gene_symbol': best_match.get('geneSymbol'), 'gencode_id': best_match.get('gencodeId'), 'chro