
Cheminformatics
Run RDKit-based molecular property, ADMET, virtual screening, and docking-prep pipelines on SMILES/SDF libraries without hand-rolling every cheminformatics step.
Overview
Cheminformatics is an agent skill for the Validate phase that builds RDKit pipelines for molecular property prediction, virtual screening, ADMET analysis, docking prep, and chemical-space exploration from SMILES and SDF
Install
npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill cheminformaticsWhat is this skill?
- RDKit workflows from SMILES/SDF parsing through descriptors, fingerprints, and chemical-space clustering
- Virtual screening and drug-likeness filters including Lipinski’s Rule of Five
- ADMET-oriented prediction to drop compounds likely to fail downstream
- Molecular docking preparation and pose-oriented scoring hooks
- Reproducible cheminformatics pipelines with PubChem-style database integration patterns
Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You have huge compound libraries but no fast, reproducible way to filter for drug-likeness, ADMET risk, and diverse leads before synthesis or assays.
Who is it for?
Indie builders shipping chemistry-adjacent agents, internal discovery tools, or research prototypes that must score real structures with RDKit.
Skip if: Teams without chemistry inputs, builders who only need generic Python data science with no molecular structures, or production wet-lab protocols with no computational screening step.
When should I use this skill?
You have molecular structures (SMILES/SDF) and need property prediction, screening, ADMET triage, docking preparation, or chemical-space exploration with reproducible RDKit pipelines.
What do I get? / Deliverables
You get an ordered cheminformatics pipeline with computed descriptors, fingerprints, filters, and clustering output you can feed into docking, procurement, or the next modeling skill.
- Reproducible cheminformatics pipeline scripts
- Filtered or ranked compound tables
- Descriptor/fingerprint outputs and clustering summaries for lead selection
Recommended Skills
Journey fit
Computational triage sits after you have candidate structures but before expensive synthesis or wet-lab bets—classic validate/prototype for drug-discovery and chemistry tooling. Prototype phase is where you narrow libraries with Lipinski filters, fingerprints, similarity search, and ADMET screens on representative sets.
How it compares
Use this skill package for RDKit workflow scaffolding—not a hosted compound database or a clinical trial ops platform.
Common Questions / FAQ
Who is cheminformatics for?
Solo and indie builders working on drug discovery, cheminformatics SaaS, or agent tools that must reason over SMILES/SDF libraries with RDKit-backed predictions.
When should I use cheminformatics?
During Validate when prototyping lead lists—e.g. filtering a downloaded library before a landing-page demo, scoring analogs for a niche therapeutic idea, or clustering candidates before docking in a build-phase integration.
Is cheminformatics safe to install?
Treat it like any third-party agent skill: review the Security Audits panel on this Prism page and inspect generated scripts before running them on sensitive compound or IP data.
SKILL.md
READMESKILL.md - Cheminformatics
# Cheminformatics Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai) ## Description Cheminformatics provides computational chemistry workflows using RDKit for molecular property prediction, virtual screening, ADMET analysis, molecular docking preparation, and chemical space exploration. The agent generates reproducible cheminformatics pipelines that transform molecular structures (SMILES, SDF) into actionable predictions about drug-likeness, toxicity, and binding affinity. Drug discovery generates vast chemical libraries that cannot all be synthesized and tested. Cheminformatics narrows the search space computationally: filtering by Lipinski's Rule of Five, predicting ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity), scoring docking poses, and clustering chemical space to identify diverse lead candidates. Each step eliminates compounds that would fail in later, more expensive stages. This skill covers the molecular informatics workflow from SMILES parsing through descriptor calculation, fingerprint generation, similarity searching, property prediction, and visualization. It integrates with databases like PubChem and ChEMBL for compound retrieval and benchmarking against known actives and inactives. ## Use When - Calculating molecular properties and descriptors - Screening compound libraries for drug-likeness - Predicting ADMET properties for lead compounds - Performing molecular similarity searches - Preparing structures for molecular docking - Visualizing chemical space and structure-activity relationships ## How It Works ```mermaid graph TD A[Molecular Input: SMILES/SDF] --> B[Parse + Validate Structures] B --> C[Calculate Descriptors] C --> D[Drug-likeness Filters] D --> E{Passes Lipinski?} E -->|No| F[Flag as Non-Drug-like] E -->|Yes| G[ADMET Prediction] G --> H[Virtual Screening Score] H --> I[Docking Preparation] I --> J[Ranked Candidate List] F --> K[Report with Flags] J --> K ``` Compounds flow through increasingly selective filters. Drug-likeness removes obviously non-viable candidates, ADMET prediction flags absorption and toxicity risks, and virtual screening ranks the survivors by predicted activity. ## Implementation ```python from rdkit import Chem from rdkit.Chem import Descriptors, AllChem, Draw, Lipinski, DataStructs from rdkit.Chem import rdMolDescriptors import pandas as pd def molecular_properties(smiles: str) -> dict: mol = Chem.MolFromSmiles(smiles) if mol is None: raise ValueError(f"Invalid SMILES: {smiles}") return { "smiles": smiles, "mw": Descriptors.MolWt(mol), "logp": Descriptors.MolLogP(mol), "hbd": Descriptors.NumHDonors(mol), "hba": Descriptors.NumHAcceptors(mol), "tpsa": Descriptors.TPSA(mol), "rotatable_bonds": Descriptors.NumRotatableBonds(mol), "rings": Descriptors.RingCount(mol), "lipinski_violations": sum([ Descriptors.MolWt(mol) > 500, Descriptors.MolLogP(mol) > 5, Descriptors.NumHDonors(mol) > 5, Descriptors.NumHAcceptors(mol) > 10, ]), } def lipinski_filter(df: pd.DataFrame) -> pd.DataFrame: return df[df["lipinski_violations"] <= 1].copy() def similarity_search(query_smiles: str, library: list[str], threshold: float = 0.7) -> list[dict]: query_mol = Chem.MolFromSmiles(query_smiles) query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, radius=2, nBits=2048) results = [] for smi in library: mol = Chem.MolFromSmiles(smi) if mol is None: continue fp = AllChem.GetMorganF