
Pubchem Database
Gives solo builders and agent authors procedural PubChem PUG-REST knowledge so they can fetch compound data and run structure searches when the bundled Python wrapper is not enough.
Overview
Pubchem-database is an agent skill most often used in Build (also Idea research and Validate scoping) that documents PubChem PUG-REST URL patterns so agents can query compounds, properties, and structure searches beyond
Install
npx skills add https://github.com/google-deepmind/science-skills --skill pubchem-databaseWhat is this skill?
- Documents full PUG-REST path pattern: domain / namespace / identifiers / operation / output with optional query flags
- Covers compound, substance, assay, gene, protein, pathway, taxonomy, and cell domains with cid, name, SMILES, InChIKey,
- Includes fast synchronous searches: substructure, 2D similarity, and identity match on SMILES
- Lists operations for properties, synonyms, CIDs-only results, assay summaries, and cross-references such as PatentID and
- Explains JSON, XML, CSV, TXT, and PNG response formats for agent-friendly parsing or visualization
- 8 PUG-REST domain types documented (compound through cell)
- 5 synchronous fast search namespace patterns on SMILES
- 5 response output formats: JSON, XML, CSV, TXT, PNG
Adoption & trust: 571 installs on skills.sh; 1.7k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need specific compound properties, synonyms, assay summaries, or SMILES-based searches from PubChem, but the shipped Python wrapper does not expose your exact PUG-REST or PUG-View operation.
Who is it for?
Solo builders adding cheminformatics or bioassay lookups to research agents, notebooks, or API backends that must hit PubChem directly with SMILES, InChIKey, or CID identifiers.
Skip if: Projects that only need the limited calls already covered by pubchem_api.py with no custom substructure, similarity, xref, or PUG-View needs.
When should I use this skill?
You need raw PubChem PUG-REST or PUG-View endpoints—for properties, synonyms, assay summaries, similarity or substructure search, or cross-references—that the pubchem_api.py wrapper does not document or support.
What do I get? / Deliverables
Your agent builds valid PubChem REST URLs, retrieves JSON or other formats, and returns structured chemical identifiers and computed properties for downstream code or analysis.
- Correctly formed PubChem REST request URLs
- Parsed compound or assay property payloads (typically JSON)
- Search result CIDs or similarity/substructure hit sets for follow-on code
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The skill is procedural documentation for calling NCBI PubChem over HTTP—classic integration work when you are wiring external scientific data into an agent or backend during the Build phase. Integrations is the canonical shelf because every workflow here ends in composed REST URLs (compound/substance/assay domains, namespaces, operations, and output formats) rather than UI or pure product planning.
Where it fits
Compare marketed compound synonyms and xrefs before committing to a med-chem side project.
Confirm which PubChem operations (properties vs assays vs pathways) your MVP agent must support.
Implement fastsimilarity_2d lookups from user-supplied SMILES inside a Codex-driven backend.
Batch-fetch MolecularWeight, XLogP, and TPSA for a list of CIDs to enrich an internal scoring service.
How it compares
Procedural REST reference for NCBI PubChem—not a hosted MCP server or a general-purpose web search skill.
Common Questions / FAQ
Who is pubchem-database for?
It is for solo and indie developers, plus small science teams, who embed PubChem compound and assay data in AI coding agents during research or product build.
When should I use pubchem-database?
Use it when integrating PubChem during Build, when scouting compounds or assays in Idea research, or when scoping cheminformatics features in Validate—especially for fastsubstructure, fastsimilarity_2d, property lists, or xref pulls the wrapper skips.
Is pubchem-database safe to install?
Treat it like any third-party agent skill: review the Security Audits panel on this Prism page and avoid piping untrusted SMILES or assay payloads into production without your usual input validation.
SKILL.md
READMESKILL.md - Pubchem Database
# Advanced PubChem API Reference This file documents the raw PUG-REST and PUG-View APIs for cases where the `pubchem_api.py` wrapper does not support your specific query. ## PUG-REST (Computed Properties & Search) **Base URL:** `https://pubchem.ncbi.nlm.nih.gov/rest/pug` The URL path always follows this structure: `/<domain>/<namespace>/<identifiers>/<operation>/<output>[?options]` ### 1. Domain The core data type: `compound`, `substance`, `assay`, `gene`, `protein`, `pathway`, `taxonomy`, `cell`. ### 2. Namespace & Identifiers How you are identifying the target record(s): - `cid/<cid>`: Compound ID - `name/<name>`: Exact chemical name - `smiles/<smiles>`: Exact SMILES match - `inchikey/<inchikey>`: Exact InChIKey match - `formula/<formula>`: Exact molecular formula - Search namespaces (use `fast` prefix for synchronous): - `fastsubstructure/smiles/<smiles>` - `fastsimilarity_2d/smiles/<smiles>` - `fastidentity/smiles/<smiles>` ### 3. Operation What data you want to extract: - `record` (default): The full raw record. - `property/<property_list>`: Specific properties (e.g., `MolecularWeight,XLogP,TPSA`). - `synonyms`: List of synonyms. - `cids`: Return only the CIDs (useful after a search). - `assaysummary`: Summary of bioassays. - `xrefs/<xref_type>`: Cross-references (e.g., `PatentID`, `PubMedID`). ### 4. Output Format for the response: `JSON`, `XML`, `CSV`, `TXT`, `PNG`. ### Examples * **Properties by CID (JSON)**: `https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularWeight,MolecularFormula/JSON` * **Mass Range Search (JSON)**: `https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/molecular_weight/range/400.0/400.05/cids/JSON` * **Patents by SID (JSON)**: `https://pubchem.ncbi.nlm.nih.gov/rest/pug/substance/sid/137349406/xrefs/PatentID/JSON` --- ## PUG-View (Third-Party Annotations & Text) Used for retrieving comprehensive textual annotations (like GHS Safety, Pharmacology, Toxicity) compiled from external sources. **Base URL:** `https://pubchem.ncbi.nlm.nih.gov/rest/pug_view` The standard structure for retrieving specific sections: `https://pubchem.ncbi.nlm.nih.gov/rest/pug_view/data/compound/<cid>/JSON?heading=<Section+Heading>` *Note: Spaces in headings must be replaced with `+`.* ### Common Headings * `Safety+and+Hazards` * `Pharmacology+and+Biochemistry` * `Toxicity` * `Drug+and+Medication+Information` * `Experimental+Properties` # PubChem Workflows Follow these checklists for complex, multi-step queries to ensure accurate results. ## Workflow 1: Comprehensive Chemical Profiling When asked to provide a complete profile of a chemical (e.g., "Tell me everything about Aspirin"): 1. **Resolve Name**: Run `pubchem_api.py resolve` to get the primary CID. 2. **Get Properties**: Run `pubchem_api.py properties` using the CID to get basic chemical traits (Weight, XLogP). 3. **Check Safety**: Run `pubchem_api.py safety` to fetch GHS hazard information. 4. **Check Pharmacology**: Run `pubchem_api.py pharmacology` to understand its biological/medical use. 5. **Synthesize**: Read all output JSON files and compile a comprehensive markdown report. ## Workflow 2: Structure-Based BioAssay Lookup When asked to find targets or assays for compounds similar to a given structure: 1. **Search Structure**: Run `pubchem_api.py similarity` (for 2D similarity) or `pubchem_api.py substructure` using the target SMILES string. 2. **Filter Results**: Read the resulting JSON file. The search may return hundreds of CIDs. Select the top 5-10 most relevant CIDs. 3. **Fetch Assays**: For each selected CID, run `pubchem_api.py assays`. 4. **Analyze**: Review the assay summaries to identify common biological targets (e.g., specific genes or proteins) that these compounds interact with. # Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the Li