Jaspar Database

Name: Jaspar Database
Author: google-deepmind

google-deepmind/science-skills

Query the JASPAR transcription-factor motif API from an agent workflow for genomics or regulatory research.

Overview

JASPAR Database is an agent skill for the Build phase that queries the JASPAR transcription-factor motif API with validated export formats.

Install

npx skills add https://github.com/google-deepmind/science-skills --skill jaspar-database

What is this skill?

Wraps JASPAR REST API v1 with rate-limited HTTP client (10 qps)
Supports json, jsonp, jaspar, meme, transfac, pfm, and yaml response formats
CLI-style Python entry with argument validation and truncated output cap (~50k chars)
Apache-2.0 Google DeepMind science-skills packaging with shared scienceskillscommon
Useful for motif lookup, matrix export, and TF binding research automations
10 requests per second rate limit
50,000 character output truncation cap
7 supported export formats

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

What problem does it solve?

You need JASPAR motif matrices in your agent pipeline but do not want brittle one-off HTTP scripts.

Who is it for?

Bioinformatics solo builders and researchers automating TF motif fetch inside agent or Python workflows.

Skip if: General web developers with no genomics use case or teams needing offline motif databases only.

When should I use this skill?

User needs JASPAR motif matrices, TF profiles, or API access during genomics or regulatory research tasks.

What do I get? / Deliverables

You retrieve JASPAR records in the format your downstream tool expects, with rate limiting and safe output truncation.

JASPAR API response in chosen format
Truncated CLI-printed motif data for agent context

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildIntegrations & version control

Integrating a curated external science API into analysis pipelines is build-phase integration work. Integrations subphase fits HTTP clients, format validation, and motif/PFM retrieval against JASPAR.

How it compares

Focused JASPAR API integration—not a full variant-calling or ChIP-seq analysis pipeline.

Common Questions / FAQ

Who is jaspar-database for?

Computational biologists and ML builders who use agents to pull transcription factor profiles from JASPAR during analysis or reporting.

When should I use jaspar-database?

During Build integrations when piping motif data into notebooks, pipelines, or literature-aware agent tools that reference JASPAR.

Is jaspar-database safe to install?

It performs outbound HTTPS to jaspar.elixir.no; review Security Audits on this page and pin dependencies in the science-skills repo.

SKILL.md

READMESKILL.md - Jaspar Database

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""JASPAR API skill wrapper."""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import math
import re
import sys
import urllib.parse
import urllib.request

from science_skills.scienceskillscommon import http_client

JASPAR_URL = "https://jaspar.elixir.no/api/v1/"
_CLIENT = http_client.HttpClient(JASPAR_URL, qps=10)
_MAX_OUTPUT_CHARS = 50_000


def _print_text(text):
  """Prints text, truncating if it exceeds _MAX_OUTPUT_CHARS."""
  if len(text) > _MAX_OUTPUT_CHARS:
    print(text[:_MAX_OUTPUT_CHARS])
    print(
        f"\n... [truncated: {len(text)} chars"
        f" total, showing first {_MAX_OUTPUT_CHARS}]"
    )
  else:
    print(text)


_VALID_FORMATS = (
    "json",
    "jsonp",
    "jaspar",
    "meme",
    "transfac",
    "pfm",
    "yaml",
)


def validate_matrix_id(matrix_id: str):
  """Validates the format of a JASPAR Matrix ID."""
  if not re.match(r"^MA\d{4}\.\d+$", matrix_id):
    print(
        f"Error: Invalid Matrix ID format '{matrix_id}'. Expected format is"
        " 'MA0488.2'.",
        file=sys.stderr,
    )
    print(
        "Hint: If you have a gene symbol (e.g., 'JUN'), you must first use the"
        " 'resolve_tf_id' command.",
        file=sys.stderr,
    )
    sys.exit(1)


def resolve_tf_id(name: str, tax_id: str):
  """Resolves a TF name to a JASPAR Matrix ID."""
  url = f"{JASPAR_URL}matrix/?name={urllib.parse.quote(name)}&tax_id={tax_id}"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)

  if not data or "results" not in data or len(data["results"]) == 0:
    print(f"No results found for TF '{name}' in tax_id {tax_id}")
    return

  print(
      f"Found {len(data['results'])} matching Matrix IDs for '{name}' (tax_id:"
      f" {tax_id}):\n"
  )
  for r in data["results"]:
    matrix_id = r.get("matrix_id")
    tf_name = r.get("name")
    family = r.get("family", [])
    species = r.get("species", [])

    family_str = ", ".join(family) if isinstance(family, list) else family
    species_str = (
        ", ".join([str(s.get("tax_id")) for s in species])
        if species
        else "Unknown"
    )

    print(f"- Matrix ID: {matrix_id}")
    print(f"  Name: {tf_name}")
    print(f"  Family: {family_str}")
    print(f"  Taxonomies: {species_str}\n")


def infer_from_sequence(sequence):
  """Infers potential TF binding matrices from a raw protein sequence."""
  url = f"{JASPAR_URL}infer/{urllib.parse.quote(sequence)}/"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)

  if not data or "results" not in data or not data["results"]:
    print("No corresponding matrices inferred from sequence.")
    return

  print(f"Inferred {len(data['results'])} potential TF profiles:")
  for r in data["results"]:
    mid = r.get("matrix_id")
    name = r.get("name")
    print(f"- {mid} ({name}): E-value {r.get('evalue')}")


def get_tffm(tffm_id):
  """Gets TF Flexible Model (TFFM) detail information."""
  url = f"{JASPAR_URL}tffm/{urllib.parse.quote(tffm_id)}/"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)
  print(dict_to_yaml(data))


def get_tf_motif(matrix_id, fmt="json"):
  """Gets the Position Frequency Matrix (PFM) for a specific TF."""
  validate_matrix_id(matrix_id)
  url = f"{JASPAR_URL}matrix/{matrix_id}/"
  if fmt != "json":
    url += f"?fo

What is this skill?

Wraps JASPAR REST API v1 with rate-limited HTTP client (10 qps)

Supports json, jsonp, jaspar, meme, transfac, pfm, and yaml response formats

CLI-style Python entry with argument validation and truncated output cap (~50k chars)

Apache-2.0 Google DeepMind science-skills packaging with shared scienceskillscommon

Useful for motif lookup, matrix export, and TF binding research automations

10 requests per second rate limit

50,000 character output truncation cap

7 supported export formats

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 535 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).

SKILL.md

READMESKILL.md - Jaspar Database

# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""JASPAR API skill wrapper."""

# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "scienceskillscommon",
# ]
# [tool.uv.sources]
# scienceskillscommon = { path = "../../scienceskillscommon" }
# ///

import argparse
import math
import re
import sys
import urllib.parse
import urllib.request

from science_skills.scienceskillscommon import http_client

JASPAR_URL = "https://jaspar.elixir.no/api/v1/"
_CLIENT = http_client.HttpClient(JASPAR_URL, qps=10)
_MAX_OUTPUT_CHARS = 50_000


def _print_text(text):
  """Prints text, truncating if it exceeds _MAX_OUTPUT_CHARS."""
  if len(text) > _MAX_OUTPUT_CHARS:
    print(text[:_MAX_OUTPUT_CHARS])
    print(
        f"\n... [truncated: {len(text)} chars"
        f" total, showing first {_MAX_OUTPUT_CHARS}]"
    )
  else:
    print(text)


_VALID_FORMATS = (
    "json",
    "jsonp",
    "jaspar",
    "meme",
    "transfac",
    "pfm",
    "yaml",
)


def validate_matrix_id(matrix_id: str):
  """Validates the format of a JASPAR Matrix ID."""
  if not re.match(r"^MA\d{4}\.\d+$", matrix_id):
    print(
        f"Error: Invalid Matrix ID format '{matrix_id}'. Expected format is"
        " 'MA0488.2'.",
        file=sys.stderr,
    )
    print(
        "Hint: If you have a gene symbol (e.g., 'JUN'), you must first use the"
        " 'resolve_tf_id' command.",
        file=sys.stderr,
    )
    sys.exit(1)


def resolve_tf_id(name: str, tax_id: str):
  """Resolves a TF name to a JASPAR Matrix ID."""
  url = f"{JASPAR_URL}matrix/?name={urllib.parse.quote(name)}&tax_id={tax_id}"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)

  if not data or "results" not in data or len(data["results"]) == 0:
    print(f"No results found for TF '{name}' in tax_id {tax_id}")
    return

  print(
      f"Found {len(data['results'])} matching Matrix IDs for '{name}' (tax_id:"
      f" {tax_id}):\n"
  )
  for r in data["results"]:
    matrix_id = r.get("matrix_id")
    tf_name = r.get("name")
    family = r.get("family", [])
    species = r.get("species", [])

    family_str = ", ".join(family) if isinstance(family, list) else family
    species_str = (
        ", ".join([str(s.get("tax_id")) for s in species])
        if species
        else "Unknown"
    )

    print(f"- Matrix ID: {matrix_id}")
    print(f"  Name: {tf_name}")
    print(f"  Family: {family_str}")
    print(f"  Taxonomies: {species_str}\n")


def infer_from_sequence(sequence):
  """Infers potential TF binding matrices from a raw protein sequence."""
  url = f"{JASPAR_URL}infer/{urllib.parse.quote(sequence)}/"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)

  if not data or "results" not in data or not data["results"]:
    print("No corresponding matrices inferred from sequence.")
    return

  print(f"Inferred {len(data['results'])} potential TF profiles:")
  for r in data["results"]:
    mid = r.get("matrix_id")
    name = r.get("name")
    print(f"- {mid} ({name}): E-value {r.get('evalue')}")


def get_tffm(tffm_id):
  """Gets TF Flexible Model (TFFM) detail information."""
  url = f"{JASPAR_URL}tffm/{urllib.parse.quote(tffm_id)}/"
  print("Request url: ", url)
  data = _CLIENT.fetch_json(url)
  print(dict_to_yaml(data))


def get_tf_motif(matrix_id, fmt="json"):
  """Gets the Position Frequency Matrix (PFM) for a specific TF."""
  validate_matrix_id(matrix_id)
  url = f"{JASPAR_URL}matrix/{matrix_id}/"
  if fmt != "json":
    url += f"?fo

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is jaspar-database for?

When should I use jaspar-database?

Is jaspar-database safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is jaspar-database for?

When should I use jaspar-database?

Is jaspar-database safe to install?

SKILL.md