
google-deepmind/science-skills
33 skills18.9k installs56.1k starsGitHub
Install
npx skills add https://github.com/google-deepmind/science-skillsSkills in this repo
1Literature Search ArxivLiterature Search arXiv is an agent skill that teaches correct arXiv advanced query syntax for scripts/search_arxiv.py, aimed at solo and indie builders who need reproducible academic search instead of vague keyword guesses. It covers field prefixes for title, author, abstract, comments, journal reference, category, report number, and all fields, plus boolean operators, grouped expressions, exact phrases, and submission date windows. Use it during early research when you are exploring AI, ML, or science topics, comparing prior art, or gathering citations before validation or build. The reference is procedural knowledge only—no live network calls from the skill itself—so your agent applies the patterns when invoking the bundled search script. It matters for Prism journey placement because it anchors the Idea phase with citable, structured discovery that feeds scope and prototype choices later.741installs2Literature Search OpenalexLiterature Search OpenAlex is an agent skill that teaches your coding agent how to navigate the OpenAlex scholarly graph—authors, works, institutions, and the filterable metadata fields exposed by the API. Solo builders working on AI features, biotech adjacency, or any evidence-heavy idea can use it during early research to find seminal papers, trace author affiliations, and rank sources by citations without manually clicking through disconnected search UIs. The skill documentation emphasizes which fields support sorting, grouping, and filtering—display names, ORCID, works_count, cited_by_count, affiliation lineage, and created-date windows—so queries are reproducible in agent sessions. It is phase-specific to Idea research: you install it when you need defensible references before validation or build commitments. Pair it with note-taking or PRD skills once you have a shortlist of papers. It does not replace reading papers end-to-end; it accelerates structured retrieval so you spend chat tokens on synthesis rather than guessing query syntax.691installs3Workflow Skill CreatorWorkflow Skill Creator is a meta skill from Google DeepMind’s science-skills line that turns a conversation or multi-step process you already completed into a durable agent skill. Solo builders who repeatedly perform the same research, analysis, or codegen ritual can capture the pattern once and invoke it by name later. The skill’s center of gravity is disciplined discovery: Phase 1 brainstorming is mandatory, with short conversational rounds that validate your understanding of the workflow, clarify inputs and outputs, and probe how often the ritual should run before any SKILL.md body is written. That gate exists because skipping discovery yields skills that are too brittle or too hand-wavy. It complements generic skill-creators by requiring an observed workflow as source material. Use after Validate or Build sessions when you have a proven path; file under agent-tooling because the deliverable extends your personal or team skill library rather than shipping end-user product code directly.645installs4Pubmed DatabasePubMed Database is a science-skills agent capability focused on advanced NCBI linking—not just keyword search. It teaches how to pair a target database with the correct linkname so queries follow explicit semantic bridges: who cites whom, bibliography extraction, MeSH-adjacent similar papers, PMID-to-PMCID resolution for full text, and jumps into genes, sequences, proteins, chemicals, bioassays, and clinical variants. Solo builders validating biotech ideas, clinical decision-support prototypes, or content products grounded in primary literature can install it so coding agents stop guessing API parameters. Use during early research sprints when you need reproducible citation traversal or compound–paper linkage before designing your own data model or RAG corpus. It complements generic web search by anchoring answers in PubMed’s curated cross-links rather than ad-hoc scraping.643installs5Literature Search Biorxivliterature-search-biorxiv is a Google DeepMind science-skills agent capability that searches bioRxiv and medRxiv preprints by date range and applies local category filtering. Solo and indie builders working on health, bio, or research-adjacent products install it when they need reproducible literature pulls instead of ad-hoc browser tabs. The skill ships as a Python script wired to https://api.biorxiv.org/ with conservative request pacing, shared HTTP utilities, and JSON output your coding agent can summarize or cross-check against hypotheses. Use it during early discovery when you are validating mechanisms, prior art, or competitive scientific positioning before scope or prototype work. It is not a full systematic-review platform; it is a focused preprint retrieval step you can chain into note-taking, citation drafting, or validation docs in later journey phases.642installs6Literature Search EuropepmcLiterature Search Europe PMC is an agent skill for solo builders and researchers who need trustworthy, open-access life-science literature without breaking API etiquette. It wraps the Europe PMC corpus—millions of abstracts and full-text articles—behind a scripted workflow that always applies an open-access filter, fetches PMCIDs, downloads XML or plain text where licensed, and assembles bibliographies for downstream notes or specs. Setup depends on `uv` on PATH and a explicit license notification file so you consciously accept Europe PMC terms before bulk retrieval. The skill forbids ad-hoc `python3 -c` runs that skip dependencies, which keeps agent sessions reproducible on fresh machines. It fits early journey work: validating biotech adjacent ideas, writing research memos, or grounding agent answers in primary sources. It is not a paywalled literature broker; scope stays OA-only by design.642installs7Uvuv is a journey-wide prerequisite agent skill from Google DeepMind’s Science Skills collection that ensures the Astral uv Python package manager is installed and reachable before any dependent skill invokes Python CLIs. Solo builders hitting science workflows often fail on missing uv or PATH drift on macOS and Linux; this skill sequences version checks, optional install via the official install.sh, and a consolidated PATH fix so bare uv commands work in the same shell session. It does not manage project lockfiles or virtualenv layout—that stays in downstream science skills—but it removes the most common hard stop at session start. Treat it as meta infrastructure: run whenever another skill’s docs say uv is required, whether you are in Build experimenting with pipelines, Ship reproducing a colleague’s environment, or Operate re-running an analysis notebook driver months later.623installs8Clinical Trials DatabaseClinical Trials Database is a reference agent skill for the ClinicalTrials.gov REST API version two. Solo builders in digital health, biotech tooling, or competitive intelligence install it so agents do not hallucinate endpoint shapes, enum values, or the difference between relevance-ranked queries and exact filters. The SKILL.md maps search areas across conditions, interventions, titles, locations, sponsors, and identifiers, and explains advanced Essie filtering for recruitment status and study design. Use it when you are still in discovery—mapping who is running trials, what phase they are in, and whether your wedge overlaps an active NCT program—before you commit engineering to integrations or dashboards. It also supports validate-phase scoping when you need evidence for investor or regulatory narratives. The skill is procedural API knowledge, not a hosted MCP server; your agent still performs HTTP calls with the patterns documented here.574installs9Pdb DatabasePDB Database is an agent skill from Google DeepMind’s science-skills pack that guides downloading protein structure coordinate files from RCSB. It wraps a small Python entry point using a rate-limited HTTP client against files-beta.rcsb.org, with ID sanitization and optional multi-ID input. Solo builders and indie researchers use it during early structural biology or bioinformatics exploration when they need mmCIF or PDB files on disk without writing fetch boilerplate from scratch. The skill explicitly cautions against using the script for large bulk corpora and points operators toward proper archive strategies after checking with the user. It fits agent-assisted research and prototype pipelines—not production MD simulation ops or proprietary drug-discovery compliance workflows. Pair it with downstream analysis skills once structures are local.572installs10Pubchem DatabasePubchem-database is an agent skill from Google DeepMind’s science-skills collection that teaches models how to use PubChem’s raw PUG-REST and PUG-View HTTP APIs when the included pubchem_api.py helper does not support a specific query. Solo builders shipping cheminformatics features, drug-discovery research agents, or chemistry education tools install it so Claude Code, Cursor, or Codex can construct correct NCBI endpoints instead of guessing URL shapes. The reference walks through domain selection (compound through cell), identifier namespaces, property and synonym fetches, bioassay summaries, and fast substructure or similarity searches, with explicit output format choices. It fits early research when you are validating compound ideas against public data, and later when you integrate live lookups into pipelines. Because PubChem is public REST with rate norms, the skill emphasizes precise paths and operations rather than bespoke SDK magic—ideal for indie teams who want reproducible agent behavior without maintaining a private chemistry database.571installs11Openfda DatabaseopenFDA Database is an agent skill that turns the openFDA REST API into repeatable query recipes for solo builders and small teams doing pharmacovigilance, competitive intelligence, or health-product validation. It specifies how to authenticate (optional API key via environment or flag), construct `search`, `sort`, and `count` parameters, and when to use `.exact` matching so multi-word drug and reaction fields do not return noisy tokenized hits. The reference is aimed at coding agents that may run many requests in one session, so it foregrounds daily quota pitfalls and safe automation habits. Use it while you are still proving an idea with public FDA evidence, then again when you integrate live FDA lookups into backends, dashboards, or research agents. It does not replace clinical judgment or legal compliance review—it gives you a disciplined HTTP interface to authoritative open regulatory data.566installs12Ncbi Sequence Fetchncbi-sequence-fetch is a Google DeepMind science skill that packages NCBI E-utilities into a rate-aware CLI-oriented workflow for solo builders and small teams building research agents or data pipelines. It targets the moment you need authoritative protein or nucleotide records without maintaining your own Entrez client, parsing edge cases, or tripping NCBI throttling. Subcommands cover the usual retrieval path: search IDs, fetch records, follow links, translate CDS regions, search patent sequences, and resolve genes to proteins. The implementation is Python-first with optional API-key acceleration and shared HTTP helpers from the science-skills common library. Install it when your agent must ground answers in real GenBank/RefSeq data rather than hallucinated sequences, or when you are prototyping validators and annotators that need fresh NCBI pulls on demand.563installs13PymolPyMOL is a quick-reference agent skill for computational biology and chemistry builders who need reproducible, headless molecular graphics. It encodes the exact boilerplate every script must use so PyMOL launches in quiet command-line mode without reversing import and finish_launching. The reference explains OSMesa constraints in environments without displays or GPUs, steering agents toward png export and away from draw paths that require OpenGL. Selection syntax tables cover chains, residue numbers and ranges, residue names, atom names, and secondary-structure filters so agents can target binding sites, helices, or C-alpha traces consistently. Solo builders using Claude Code or Cursor to batch figures for papers, decks, or structural reviews install it to stop agents from guessing PyMOL startup order or incompatible render calls. It pairs with Python scientific stacks and agent workflows that generate analysis scripts rather than clicking in the desktop GUI.559installs14Clinvar Databaseclinvar-database is an agent skill that wraps the NCBI ClinVar database through E-utilities so you can programmatically search and retrieve clinical variant records without hand-copying from the web UI. It targets solo builders and small teams working on health, genomics, or research tooling who need trustworthy public variant assertions inside Claude Code, Cursor, or similar agents. Use it in the Idea phase when you are validating whether a gene, condition, or variant story is supported by ClinVar before you commit engineering time to pipelines or user-facing features. The embedded client emphasizes robust HTTP handling, rate limits, and optional API key configuration rather than one-off curl recipes. It is narrower than a full bioinformatics stack—it is the ClinVar slice—so pair it with your own orchestration for VCF parsing, visualization, or regulated clinical workflows. For Prism’s journey map it sits on the research shelf because its job is evidence lookup, not shipping or distribution.551installs15Uniprot DatabaseUniProt Database is a Google DeepMind science skill that gives coding agents disciplined access to the UniProt Knowledgebase, UniParc archive, and UniRef clusters through maintained Python tooling rather than improvised HTTP calls. Solo builders and researchers prototyping bioinformatics agents, lab notebooks, or data products use it when they need real protein metadata, taxonomy, sequences, or cross-references and must not hallucinate functions or accessions. The skill enforces wrapper-only queries, user-facing license acknowledgment for UniProt terms, and clear scope limits so alignment, folding, and similarity search stay on specialized tools. It pairs naturally with uv-based Python environments and fits early research spikes before you invest in heavier computational biology pipelines.551installs16Protein Sequence Similarity SearchProtein Sequence Similarity Search is a Google DeepMind science-skills agent workflow that automates ColabFold MMseqs2 homologue search for a single protein sequence. It posts your FASTA to the ColabFold API, waits for the job to finish, unpacks the resulting MSA tarball, extracts alignment metadata from A3M headers, and emits a Markdown table of homologues ranked by E-value. Solo builders and small labs use it when they need reproducible, scriptable similarity results without manually clicking through web UIs. It targets computational biology and ML-adjacent research stacks rather than typical SaaS shipping, and requires network access to api.colabfold.com. Pair it with downstream analysis skills once you have the hit list on disk.548installs17Ensembl DatabaseEnsembl Database is an agent skill that documents the Ensembl REST API so solo builders can fetch gene, transcript, protein, and sequence data without guessing endpoint shapes. It complements scripted helpers like ensembl_api.py when you need custom lookups, alternate assemblies, or response formats the script does not cover. You reach for it while building bioinformatics pipelines, research dashboards, or CLI utilities that must honor Ensembl’s content negotiation, region syntax, and throttling rules. The reference emphasizes practical integration details—JSON versus FASTA, GRCh38 versus GRCh37 hosts, and lookup expansions—so your agent generates correct requests instead of brittle one-offs. It fits indie science and health-data prototypes that talk to public genomics infrastructure rather than mock datasets alone.547installs18Dbsnp Databasedbsnp-database is an agent skill summarizing how NCBI’s Variation Services API supports dbSNP lookups and coordinate conversions for bioinformatics tooling. Solo builders and small lab-adjacent teams use it when Claude Code or similar agents must resolve RefSNP records, translate VCF chrom/pos/ref/alt tuples into contextual SPDI, or chase HGVS expressions to canonical rsIDs without misreading nested JSON fields like placements_with_allele and allele_annotations. The readme is implementation notes for scripts in the science-skills repo: which paths to call, what primary_snapshot_data contains, and how multi-step resolve flows chain. It fits build-time backend integrations and earlier research prototyping when validating variant lists against NCBI before committing pipeline logic. Expect intermediate familiarity with genomic coordinates and REST error handling.546installs19Protein Sequence MsaProtein Sequence MSA is an agent skill that runs EBI Clustal Omega to compute a multiple sequence alignment from a file of protein sequences. Solo builders working on structural biology tools, enzyme design prototypes, or ML feature pipelines install it so Claude Code or Codex can invoke the documented Python script instead of guessing curl steps against the EBI job API. The skill covers payload preparation, authenticated email parameters, rate-limited HTTP polling, and timeout behavior so alignments complete or fail predictably in automation. It sits in Build integrations because it depends on network access to european bioinformatics infrastructure and local sequence files. For indie science software, MSAs are a common prerequisite to conservation plots, homology modeling, or downstream folding workflows—this skill makes that step repeatable inside agent sessions without re-deriving the REST contract each time.546installs20Opentargets DatabaseOpen Targets Database is an agent skill that wraps the Open Targets Platform GraphQL API for solo builders and small research teams working on biotech, health AI, or drug-discovery tooling. It documents how to run the bundled Python query script so your coding agent can fetch GWAS studies, credible sets, locus-to-gene scores, target druggability, and bidirectional target–disease associations without pasting raw GraphQL into chat. The guide emphasizes practical subcommands and pagination limits so responses stay usable inside Claude Code, Cursor, or Codex sessions. Install it when you need reproducible, script-driven access to ranked target evidence rather than one-off web browsing. It matters because association and tractability data are dense; the skill standardizes calls and output shaping so agents can reason over real Open Targets records while you scope validators, knowledge bases, or internal research assistants.544installs21Embl Ebi OlsEMBL-EBI OLS is a compact integration skill for developers and research-minded solo builders who must resolve biomedical and life-science terms against public ontologies. It documents the OLS4 API at the European Bioinformatics Institute: full-text search, autocomplete, ontology browsing, and rich hierarchical queries including part_of and develops_from style relations—not only direct is-a links. Use it when normalizing metadata in bioinformatics side projects, building agent tools that suggest controlled vocabulary labels, or validating that pipeline annotations match community ontologies. The skill is reference-shaped rather than a full client SDK; your agent should translate endpoint tables into fetch code with proper IRI encoding. It pairs naturally with data ingestion and labeling steps in scientific agents, not with generic CRUD app scaffolding.542installs22Interpro DatabaseInterPro-database is an agent skill that teaches how to query the EBI InterPro REST API for protein entries, domains, families, repeats, and related annotations. Solo builders and small research teams use it when an agent must construct valid `query_params` for endpoints like `/entry`, apply GO and member-database filters, and paginate results without guessing undocumented behavior. The SKILL body is a structured reference derived from the official Swagger spec, so agents pass dictionaries into helpers such as `fetch_interpro_data` and `get_interpro_count` consistently. It matters because wrong parameter combinations silently fail or return misleading aggregates in production bioinformatics workflows. Install it when you are exploring InterPro-backed hypotheses, prototyping annotation lookups, or wiring scientific agents that need authoritative domain and family metadata rather than scraping static dumps.542installs23String Databasestring-database is an agent skill from the Google DeepMind science-skills pack that teaches your coding agent how to call STRING-backed CLI workflows for functional enrichment, PPI significance, disease or GO term discovery, and per-protein annotation exports. Solo and indie builders who ship research tooling, internal bio pipelines, or reproducible analysis repos install it when they want consistent commands instead of one-off curl scripts. Typical flow: pass gene or protein identifiers plus a NCBI taxonomy species ID, choose enrichment versus PPI enrichment versus term search, and land structured TSV files under a known path for downstream notebooks or services. It matters because ontology and pathway interpretation steps are easy to get wrong under time pressure; the skill encodes the right subcommands, required flags (`--identifiers` versus `--term_text`), and output field semantics (category, term, p_value, fdr, edge counts). Use it while building analysis features, automating literature-adjacent discovery, or validating that a candidate gene set is functionally coherent before you commit UI or API shape.541installs24Chembl DatabaseChEMBL Database is an agent skill that teaches coding agents how to use the EBI ChEMBL public REST API for bioactivity and chemical structure research. It is built for solo builders, researchers, and small teams working on life-science tooling, RAG over compound data, or validation prototypes that need authoritative compound, target, assay, and document identifiers rather than scraped HTML. The reference covers standard JSON list and detail routes, batch fetch, which resources accept free-text search, and specialized routes for similarity and substructure matching, 2D structure images, and API status checks. Filter operators mirror Django-style query params so agents can compose precise filters on ChEMBL fields. Use it in the Idea phase when exploring competitors’ mechanisms, screening chemical space, or grounding an AI agent in verified open datasets. It is integration knowledge, not a hosted database—your agent still performs HTTPS calls and handles pagination, limits, and response parsing in your stack.540installs25Reactome DatabaseReactome Database is a science-skills reference that teaches agents how to call the Reactome Analysis Service at https://reactome.org/AnalysisService. Solo builders in biotech tooling, academic side projects, or data pipelines install it when they need reliable pathway overrepresentation or expression analysis against Reactome’s curated graphs. The skill enumerates database metadata endpoints, per-identifier and batch identifier routes (including projection to human), and the full token API for fetching and filtering results after a job completes. It specifies content types, line-oriented versus TSV expression input, and common query parameters such as species, pageSize, and includeDisease. That structure matters because enrichment APIs are easy to mis-call—wrong body format or missing token follow-up silently wastes runs. Use it while implementing analysis features in Python, R orchestrators, or agent-driven bioinformatics scripts rather than as a general biology tutor.540installs26Foldseek Structural Searchfoldseek-structural-search wraps Google DeepMind’s science-skills pattern for calling Foldseek’s hosted structural search API from a local PDB or mmCIF input. It is aimed at researchers and technical solo builders working on protein structure, fold classification, or structural genomics—not typical SaaS shipping—who need fast homology-style lookups without standing up Foldseek infrastructure. The embedded script validates allowed database names, respects API throughput limits, and formats multipart requests for structure uploads. Agents can invoke it when comparing an experimental or predicted model against AFDB, PDB100, CATH50, Swiss-Prot subsets, and other listed catalogs. Expect intermediate-to-advanced bioinformatics literacy; outputs are hit lists and alignments for interpretation in notebooks, papers, or pipeline design rather than user-facing product features.539installs27Quickgo DatabaseQuickGO Database is a science-skills agent capability for querying Gene Ontology annotations through QuickGO’s annotation API using the bundled `quickgo_tool.py` CLI. Indie builders and researcher-developers use it when an agent must explain what a protein does, list genes annotated to a GO term, or gather evidence-coded annotation tables for scripts and prototypes—not when they need a generic web search.summary. The README centers on `annotation search` with parameters such as gene product IDs, GO IDs, aspect filters, taxonomy, and experimental evidence (ECO). Outputs land in JSON files for pipelines or human review. Complexity is advanced: you should understand GO aspects, UniProtKB identifiers, and evidence codes. It pairs naturally with other DeepMind science skills for literature or pathway work. Prism tags it under Build integrations with a research vertical because it is a deterministic data bridge, not a journey-wide methodology skill.538installs28Human Protein Atlas DatabaseHuman Protein Atlas Database is a reference skill for solo builders, indie bioinformatics developers, and research agents who need precise HPA search strings—not vague natural-language guesses. It explains the case-insensitive key-value grammar, how to combine subfields with semicolons and commas, and how to chain Boolean AND, OR, and NOT with parentheses for expression, mRNA, localization, and functional class constraints. The material is especially valuable during Idea-phase research when you narrow target genes, tissues, or pathways before validation experiments or a scoped MVP. Advanced complexity: you should read field names carefully and avoid quoting multi-word values incorrectly. The skill does not run queries itself; it equips your coding agent to generate API-ready filter expressions you can paste into HPA tools or automate. Pair it with your own HTTP client or notebook workflow once you move from research into Build.537installs29Gtex DatabaseGTEx Database is a science agent skill from Google DeepMind’s science-skills collection that wraps the GTEx Portal REST API v2 for solo builders and researchers shipping genomics-flavored tools—expression browsers, target validation dashboards, or ETL into your own warehouse. It emphasizes compliant access: sequential requests, pagination, and conservative query-per-second throttling rather than hammering the public portal. Constants such as dataset gtex_v10 and GENCODE v39 anchor responses to a known snapshot so agents do not hallucinate version drift. The implementation is a Python CLI using a shared HttpClient helper, suitable for Codex or Claude Code executing scripted fetches in a repo with the scienceskillscommon package available. This is an integration skill, not a hosted database product on Prism; you still own storage, caching policy, and HIPAA or human-subjects compliance if you leave research context. Intermediate complexity assumes comfort with argparse CLIs, JSON responses, and API pagination loops.535installs30Jaspar DatabaseJASPAR Database is a science agent skill that calls the public JASPAR API for transcription factor binding profiles and motifs. It exposes a Python wrapper with validated output formats, sensible truncation for large responses, and shared HTTP utilities from the Google DeepMind science-skills family. Solo builders in computational biology, bioinformatics startups, or ML teams augmenting regulatory genomics pipelines can invoke it from Claude Code or similar agents instead of hand-rolling urllib clients. The skill is phase-specific to integration: you already know which TF or matrix you need and want reliable CLI-style access during Build. Advanced complexity reflects domain knowledge (PFM, TRANSFAC, MEME) and correct API usage.535installs31Alphagenome Single Variant Analysisalphagenome-single-variant-analysis is an agent skill that gives solo builders and small bioinformatics teams a concise AlphaGenome API reference for single-variant workflows using the `alphagenome` Python package. It focuses on the footguns that break science code fast: loading credentials from `.env`, initializing `dna_client` against the Google DeepMind science endpoint, and using the right coordinate systems—0-based intervals versus 1-based variant positions. You also get the core object model for intervals and variants, plus hooks into scorers, track data, and matplotlib visualization when you need to inspect model output. It is niche compared with typical SaaS skills, but valuable when your agent-assisted pipeline must query variant effect tracks instead of generic REST glue.534installs32Gnomad DatabaseGnomad-database is a science agent skill that fetches gene-level constraint statistics from the Genome Aggregation Database (gnomAD) through its public GraphQL endpoint. Solo builders and small teams working on genomics tooling, rare-disease apps, or research copilots use it when they need reproducible pLI, LOEUF, and related metrics without hand-writing API clients. The bundled Python entry point wraps HTTP calls with an enforced query rate so automated runs stay within gnomAD usage expectations. Output is structured JSON on disk, which agents can diff, summarize, or feed into downstream variant interpretation steps. It fits the Idea and research moment of the journey: grounding hypotheses in population genetics evidence before prototyping dashboards or clinical decision support features.534installs33Encode Ccres DatabaseEncode Ccres Database is a narrow agent skill for researchers and technical solo builders working on regulatory genomics. It is a schema reference for the ENCODE SCREEN GraphQL API used by the broader encode-database skill, so your agent can construct correct queries instead of guessing field names. You can search candidate cis-regulatory elements by genome assembly (for example grch38 or mm10), genomic ranges, accessions, and signal ranks; pull biosample metadata and experiment accessions; and drill into individual cCRE records with nearest genes and cell-type-specific scores. Prism lists it under early-journey research because it supports hypothesis-driven exploration and pipeline design, not launch or growth workflows. Install it when your coding agent must integrate or automate ENCODE SCREEN data access and you want procedural knowledge aligned to the actual API types and arguments.531installs