
Jaspar Database
Query the JASPAR transcription-factor motif API from an agent workflow for genomics or regulatory research.
Overview
JASPAR Database is an agent skill for the Build phase that queries the JASPAR transcription-factor motif API with validated export formats.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill jaspar-databaseWhat is this skill?
- Wraps JASPAR REST API v1 with rate-limited HTTP client (10 qps)
- Supports json, jsonp, jaspar, meme, transfac, pfm, and yaml response formats
- CLI-style Python entry with argument validation and truncated output cap (~50k chars)
- Apache-2.0 Google DeepMind science-skills packaging with shared scienceskillscommon
- Useful for motif lookup, matrix export, and TF binding research automations
- 10 requests per second rate limit
- 50,000 character output truncation cap
- 7 supported export formats
Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need JASPAR motif matrices in your agent pipeline but do not want brittle one-off HTTP scripts.
Who is it for?
Bioinformatics solo builders and researchers automating TF motif fetch inside agent or Python workflows.
Skip if: General web developers with no genomics use case or teams needing offline motif databases only.
When should I use this skill?
User needs JASPAR motif matrices, TF profiles, or API access during genomics or regulatory research tasks.
What do I get? / Deliverables
You retrieve JASPAR records in the format your downstream tool expects, with rate limiting and safe output truncation.
- JASPAR API response in chosen format
- Truncated CLI-printed motif data for agent context
Recommended Skills
Journey fit
How it compares
Focused JASPAR API integration—not a full variant-calling or ChIP-seq analysis pipeline.
Common Questions / FAQ
Who is jaspar-database for?
Computational biologists and ML builders who use agents to pull transcription factor profiles from JASPAR during analysis or reporting.
When should I use jaspar-database?
During Build integrations when piping motif data into notebooks, pipelines, or literature-aware agent tools that reference JASPAR.
Is jaspar-database safe to install?
It performs outbound HTTPS to jaspar.elixir.no; review Security Audits on this page and pin dependencies in the science-skills repo.
SKILL.md
READMESKILL.md - Jaspar Database
# Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """JASPAR API skill wrapper.""" # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// import argparse import math import re import sys import urllib.parse import urllib.request from science_skills.scienceskillscommon import http_client JASPAR_URL = "https://jaspar.elixir.no/api/v1/" _CLIENT = http_client.HttpClient(JASPAR_URL, qps=10) _MAX_OUTPUT_CHARS = 50_000 def _print_text(text): """Prints text, truncating if it exceeds _MAX_OUTPUT_CHARS.""" if len(text) > _MAX_OUTPUT_CHARS: print(text[:_MAX_OUTPUT_CHARS]) print( f"\n... [truncated: {len(text)} chars" f" total, showing first {_MAX_OUTPUT_CHARS}]" ) else: print(text) _VALID_FORMATS = ( "json", "jsonp", "jaspar", "meme", "transfac", "pfm", "yaml", ) def validate_matrix_id(matrix_id: str): """Validates the format of a JASPAR Matrix ID.""" if not re.match(r"^MA\d{4}\.\d+$", matrix_id): print( f"Error: Invalid Matrix ID format '{matrix_id}'. Expected format is" " 'MA0488.2'.", file=sys.stderr, ) print( "Hint: If you have a gene symbol (e.g., 'JUN'), you must first use the" " 'resolve_tf_id' command.", file=sys.stderr, ) sys.exit(1) def resolve_tf_id(name: str, tax_id: str): """Resolves a TF name to a JASPAR Matrix ID.""" url = f"{JASPAR_URL}matrix/?name={urllib.parse.quote(name)}&tax_id={tax_id}" print("Request url: ", url) data = _CLIENT.fetch_json(url) if not data or "results" not in data or len(data["results"]) == 0: print(f"No results found for TF '{name}' in tax_id {tax_id}") return print( f"Found {len(data['results'])} matching Matrix IDs for '{name}' (tax_id:" f" {tax_id}):\n" ) for r in data["results"]: matrix_id = r.get("matrix_id") tf_name = r.get("name") family = r.get("family", []) species = r.get("species", []) family_str = ", ".join(family) if isinstance(family, list) else family species_str = ( ", ".join([str(s.get("tax_id")) for s in species]) if species else "Unknown" ) print(f"- Matrix ID: {matrix_id}") print(f" Name: {tf_name}") print(f" Family: {family_str}") print(f" Taxonomies: {species_str}\n") def infer_from_sequence(sequence): """Infers potential TF binding matrices from a raw protein sequence.""" url = f"{JASPAR_URL}infer/{urllib.parse.quote(sequence)}/" print("Request url: ", url) data = _CLIENT.fetch_json(url) if not data or "results" not in data or not data["results"]: print("No corresponding matrices inferred from sequence.") return print(f"Inferred {len(data['results'])} potential TF profiles:") for r in data["results"]: mid = r.get("matrix_id") name = r.get("name") print(f"- {mid} ({name}): E-value {r.get('evalue')}") def get_tffm(tffm_id): """Gets TF Flexible Model (TFFM) detail information.""" url = f"{JASPAR_URL}tffm/{urllib.parse.quote(tffm_id)}/" print("Request url: ", url) data = _CLIENT.fetch_json(url) print(dict_to_yaml(data)) def get_tf_motif(matrix_id, fmt="json"): """Gets the Position Frequency Matrix (PFM) for a specific TF.""" validate_matrix_id(matrix_id) url = f"{JASPAR_URL}matrix/{matrix_id}/" if fmt != "json": url += f"?fo