
Foldseek Structural Search
Run Foldseek structural similarity search for a PDB or mmCIF structure against curated protein structure databases through the public API.
Overview
foldseek-structural-search is an agent skill for the Idea phase that runs Foldseek structural searches for PDB/mmCIF files against named public structure databases.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill foldseek-structural-searchWhat is this skill?
- Queries search.foldseek.com with client-side rate limiting at 0.1 queries per second
- Supports nine named databases including afdb50, pdb100, cath50, and afdb-proteome
- Caps alignment hits at 300 with configurable E-value defaulting to 1000
- Python 3.10+ script with multipart upload payload for structure files
- Returns JSON-oriented results suitable for downstream parsing in agent workflows
- 9 allowed Foldseek database identifiers in ALLOWED_DATABASES
- MAX_ALIGNMENT_HITS capped at 300
- HTTP client rate limit 0.1 queries per second to search.foldseek.com
Adoption & trust: 539 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a protein structure file and need comparable folds or homologs across standard catalogs without manually using the Foldseek web UI or self-hosting search.
Who is it for?
Builders and researchers in structural biology who automate literature-scale fold lookup from agent or CLI workflows.
Skip if: General indie web or mobile shipping where you do not work with PDB/mmCIF structural data or Foldseek result interpretation.
When should I use this skill?
You need to run Foldseek structural search for a PDB or mmCIF file against a supported public database via the hosted API.
What do I get? / Deliverables
You get a bounded set of structural alignment hits from the chosen Foldseek database to inform hypotheses, targets, or next modeling steps.
- Structural alignment hit results from the selected Foldseek database
- JSON-parseable API response for downstream analysis
Recommended Skills
Journey fit
Homology and fold search is an upstream research step before committing to a biological or structural bioinformatics build, so Prism shelves it under Idea. Database search against AFDB, PDB, CATH, and related catalogs is discovery and evidence gathering, which matches the research subphase.
How it compares
A thin API integration script for hosted Foldseek search—not a full local Foldseek install or sequence-only BLAST workflow.
Common Questions / FAQ
Who is foldseek-structural-search for?
Technical users doing protein structure research who want agent-callable Foldseek database searches from PDB or mmCIF inputs.
When should I use foldseek-structural-search?
During Idea research when you are exploring fold similarity, structural neighbors, or database coverage before designing experiments, pipelines, or validation plans.
Is foldseek-structural-search safe to install?
It performs outbound calls to search.foldseek.com and runs Python with dependencies—check the Security Audits panel on this page and review uploaded structure data sensitivity before enabling network access.
SKILL.md
READMESKILL.md - Foldseek Structural Search
# Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Runs Foldseek search for a PDB/mmCIF file against different databases.""" # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// import argparse import json import os import sys import time import uuid from science_skills.scienceskillscommon import http_client ALLOWED_DATABASES = [ "afdb50", "afdb-swissprot", "pdb100", "BFVD", "mgnify_esm30", "cath50", "gmgcl_id", "bfmd", "afdb-proteome", ] MAX_ALIGNMENT_HITS = 300 DEFAULT_EVALUE = 1000 # Respect 0.1 queries per second requirement. CLIENT = http_client.HttpClient("https://search.foldseek.com", qps=1) def build_multipart_payload(fields, files): """Manually constructs a multipart/form-data byte payload.""" boundary = uuid.uuid4().hex body = bytearray() # Add standard form fields (handling lists for multiple databases) for key, values in fields.items(): if not isinstance(values, list): values = [values] for value in values: body.extend(f"--{boundary}\r\n".encode("utf-8")) body.extend( f'Content-Disposition: form-data; name="{key}"\r\n\r\n'.encode( "utf-8" ) ) body.extend(f"{value}\r\n".encode("utf-8")) # Add file data for key, filepath in files.items(): filename = os.path.basename(filepath) with open(filepath, "rb") as f: content = f.read() body.extend(f"--{boundary}\r\n".encode("utf-8")) body.extend( f'Content-Disposition: form-data; name="{key}";' f' filename="{filename}"\r\n'.encode("utf-8") ) body.extend(b"Content-Type: application/octet-stream\r\n\r\n") body.extend(content) body.extend(b"\r\n") body.extend(f"--{boundary}--\r\n".encode("utf-8")) return boundary, body # --------------------------------------------------- def main(): # Set up command line argument parsing parser = argparse.ArgumentParser( description="Query Foldseek with a PDB/mmCIF file and save the results." ) parser.add_argument("input_file", help="Path to the mmCIF or PDB file") parser.add_argument( "-o", "--output", help="Path to save the output JSON file", default="foldseek_results.json", ) parser.add_argument( "--databases", help="Comma-separated list of databases to search", default="pdb100,afdb50", ) args = parser.parse_args() file_path = args.input_file output_path = args.output ticket_url = "https://search.foldseek.com/api/ticket" # Process and validate the databases selected_dbs = [db.strip() for db in args.databases.split(",")] invalid_dbs = [db for db in selected_dbs if db not in ALLOWED_DATABASES] if invalid_dbs: print(f"[!] Error: Invalid database(s) provided: {', '.join(invalid_dbs)}") print(f"[*] Allowed databases are: {', '.join(ALLOWED_DATABASES)}") sys.exit(1) # Standard headers for all requests headers = { "User-Agent": "", "Accept": "application/json", } print(f"[*] Submitting {file_path} to Foldseek API...") print(f"[*] Searching databases: {', '.join(selected_dbs)}") # 1. Submit Ticket try: if not os.path.exists(file_path): raise FileNotFoundError(f"No such file: '{file_path}'") boundary, body = build_multipart_payload( fields={"mode": "3diaa", "database[]": se