
Torchdrug
Equip an agent with TorchDrug architecture knowledge so you can prototype graph-based drug-discovery and molecular ML pipelines without re-reading the whole library docs.
Overview
TorchDrug is an agent skill most often used in Build (also Validate, Operate) that explains TorchDrug’s modular graph-ML architecture and Configurable pipeline patterns for scientific ML work.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill torchdrugWhat is this skill?
- Explains TorchDrug’s four-module split: representation models, tasks, data handling, and core Configurable utilities
- Documents core.Configurable serialize/load/save patterns for reproducible experiment pipelines
- Covers mixing representation encoders with task definitions across datasets
- Modular design guidance for reusing graph embeddings across learning objectives
- Four modular areas: representation models, tasks, data handling, and core Configurable components
Adoption & trust: 522 installs on skills.sh; 27.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to run graph-based drug-discovery or molecular ML in TorchDrug but the library’s module boundaries and configuration lifecycle are easy to mix up.
Who is it for?
Solo builders shipping Python agent or API backends that encode graphs with TorchDrug and want copy-paste-safe patterns for configs and module reuse.
Skip if: Teams that only need generic PyTorch tutorials, non-graph tabular ML, or production pharmacovigilance without reading primary TorchDrug documentation.
When should I use this skill?
When implementing or refactoring TorchDrug graph ML code and you need architecture and Configurable lifecycle guidance in the agent context.
What do I get? / Deliverables
You get a clear mental model for models, tasks, data, and Configurable save/load so your agent can scaffold experiments and config-driven pipelines consistently.
- Experiment layout aligned to TorchDrug modules
- Configurable save/load usage for models and pipelines
- Clear separation of encoder vs task code
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Graph ML model and task code lives in the build phase as backend/scientific compute, even when the business idea is still research-heavy. Backend is the canonical shelf because representation models, tasks, datasets, and Configurable pipelines are server-side training and inference concerns.
Where it fits
Sketch whether a GIN encoder plus a property-prediction task is enough for a feasibility demo on a public benchmark.
Implement datasets, tasks, and Configurable save paths in a training service your agent can regenerate from config dicts.
Give your coding agent stable vocabulary for swapping representation models without breaking task bindings.
Reload a saved pipeline config to tweak hidden dims or task heads after an experiment branch.
How it compares
Reference skill for TorchDrug’s object model—not a generic data-science cheat sheet or an MCP server for wet-lab LIMS.
Common Questions / FAQ
Who is torchdrug for?
Python-focused solo and indie builders doing graph ML, cheminformatics-style prototypes, or agent-assisted scientific coding who use TorchDrug as their core library.
When should I use torchdrug?
During Validate when scoping a graph-ML proof of concept, in Build while wiring representation models and tasks into a backend or agent workflow, and in Operate when revisiting saved Configurable pipelines for iteration.
Is torchdrug safe to install?
Treat it like any third-party skill: review the Security Audits panel on this Prism page and your org’s policy before letting an agent run training jobs or access sensitive compound data.
SKILL.md
READMESKILL.md - Torchdrug
# Core Concepts and Technical Details ## Overview This reference covers TorchDrug's fundamental architecture, design principles, and technical implementation details. ## Architecture Philosophy ### Modular Design TorchDrug separates concerns into distinct modules: 1. **Representation Models** (models.py): Encode graphs into embeddings 2. **Task Definitions** (tasks.py): Define learning objectives and evaluation 3. **Data Handling** (data.py, datasets.py): Graph structures and datasets 4. **Core Components** (core.py): Base classes and utilities **Benefits:** - Reuse representations across tasks - Mix and match components - Easy experimentation and prototyping - Clear separation of concerns ### Configurable System All components inherit from `core.Configurable`: - Serialize to configuration dictionaries - Reconstruct from configurations - Save and load complete pipelines - Reproducible experiments ## Core Components ### core.Configurable Base class for all TorchDrug components. **Key Methods:** - `config_dict()`: Serialize to dictionary - `load_config_dict(config)`: Load from dictionary - `save(file)`: Save to file - `load(file)`: Load from file **Example:** ```python from torchdrug import core, models model = models.GIN(input_dim=10, hidden_dims=[256, 256]) # Save configuration config = model.config_dict() # {'class': 'GIN', 'input_dim': 10, 'hidden_dims': [256, 256], ...} # Reconstruct model model2 = core.Configurable.load_config_dict(config) ``` ### core.Registry Decorator for registering models, tasks, and datasets. **Usage:** ```python from torchdrug import core as core_td @core_td.register("models.CustomModel") class CustomModel(nn.Module, core_td.Configurable): def __init__(self, input_dim, hidden_dim): super().__init__() self.linear = nn.Linear(input_dim, hidden_dim) def forward(self, graph, input, all_loss, metric): # Model implementation pass ``` **Benefits:** - Models automatically serializable - String-based model specification - Easy model lookup and instantiation ## Data Structures ### Graph Core data structure representing molecular or protein graphs. **Attributes:** - `num_node`: Number of nodes - `num_edge`: Number of edges - `node_feature`: Node feature tensor [num_node, feature_dim] - `edge_feature`: Edge feature tensor [num_edge, feature_dim] - `edge_list`: Edge connectivity [num_edge, 2 or 3] - `num_relation`: Number of edge types (for multi-relational) **Methods:** - `node_mask(mask)`: Select subset of nodes - `edge_mask(mask)`: Select subset of edges - `undirected()`: Make graph undirected - `directed()`: Make graph directed **Batching:** - Graphs batched into single disconnected graph - Automatic batching in DataLoader - Preserves node/edge indices per graph ### Molecule (extends Graph) Specialized graph for molecules. **Additional Attributes:** - `atom_type`: Atomic numbers - `bond_type`: Bond types (single, double, triple, aromatic) - `formal_charge`: Atomic formal charges - `explicit_hs`: Explicit hydrogen counts **Methods:** - `from_smiles(smiles)`: Create from SMILES string - `from_molecule(mol)`: Create from RDKit molecule - `to_smiles()`: Convert to SMILES - `to_molecule()`: Convert to RDKit molecule - `ion_to_molecule()`: Neutralize charges **Example:** ```python from torchdrug import data # From SMILES mol = data.Molecule.from_smiles("CCO") # Atom features print(mol.atom_type) # [6, 6, 8] (C, C, O) print(mol.bond_type) # [1, 1] (single bonds) ``` ### Protein (extends Graph) Specialized graph for proteins. **Additional Attributes:** - `residue_type`: Amino acid types - `atom_name`: Atom names (CA, CB, etc.) - `atom_type`: Atomic numbers - `residue_number`: Residue numbering - `chain_id`: Chain identifiers **Methods:** - `from_pdb(pdb_file)`: Load from PDB file - `from_sequence(sequence)`: Create from sequence - `to_pdb(pdb_file)`: Save to PDB file **Graph Construction:** - Nodes typically represent residues (not at