
Torch Geometric
Create PyTorch Geometric `Data` objects, custom InMemoryDataset classes, and DataLoaders for graph ML experiments your agent can code end-to-end.
Overview
Torch Geometric is an agent skill for the Build phase that teaches custom graph datasets and DataLoader setup with PyTorch Geometric.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill torch-geometricWhat is this skill?
- Quick path: lists of `Data` objects fed directly to `DataLoader` without a dataset class
- Full `InMemoryDataset` template with `raw_file_names`, `processed_file_names`, download, and process hooks
- Guidance to verify checksums or signatures when downloading raw files
- CSV and custom raw-source ingestion into graph `Data` representations
- Batching via `torch_geometric.loader.DataLoader` for in-memory graph lists
- InMemoryDataset pattern overrides four methods: download and process plus file name properties
- DataLoader example batch size 32 over a list of Data objects
Adoption & trust: 543 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have graph-shaped raw files but no clear pattern to turn them into batched PyG `Data` objects your training script can consume.
Who is it for?
Indie ML builders prototyping GNNs who need InMemoryDataset structure or a fast list-of-`Data` path without boilerplate hunting.
Skip if: Non-graph tabular models, production feature stores without PyG, or datasets too large for in-memory `InMemoryDataset` without extending to on-disk patterns.
When should I use this skill?
Building custom graph datasets, loading CSV or numpy into PyG `Data`, or implementing `InMemoryDataset` for training.
What do I get? / Deliverables
Working dataset classes or inline `Data` lists plus `DataLoader` batches ready to plug into a GNN training loop.
- Processed `data.pt` or in-memory graph list
- Configured `DataLoader` for GNN training scripts
Recommended Skills
Journey fit
How it compares
PyG dataset authoring reference—not a general PyTorch tutorial or AutoML hyperparameter skill.
Common Questions / FAQ
Who is torch-geometric for?
Developers and researcher-builders using PyTorch Geometric who want agent-guided dataset and loader implementation.
When should I use torch-geometric?
During Build backend when wiring custom graph data from CSV or numpy into training pipelines, or when standing up a reusable `InMemoryDataset` for experiments.
Is torch-geometric safe to install?
The skill describes download URLs and file processing—review the Security Audits panel on this Prism page and validate remote data sources before `download_url` in production.
SKILL.md
READMESKILL.md - Torch Geometric
# Custom Datasets — Full Reference How to create your own graph datasets and load graph data from raw sources (CSV, pandas, numpy, etc.). ## Quick: No Dataset Class Needed For synthetic data or one-off graphs, skip the dataset machinery — just create `Data` objects and pass them to `DataLoader`: ```python from torch_geometric.data import Data from torch_geometric.loader import DataLoader data_list = [Data(x=..., edge_index=..., y=...) for _ in range(100)] loader = DataLoader(data_list, batch_size=32) ``` ## InMemoryDataset (fits in RAM) For reusable datasets that fit in CPU memory. Override 4 methods: ```python from torch_geometric.data import InMemoryDataset, download_url class MyDataset(InMemoryDataset): def __init__(self, root, transform=None, pre_transform=None, pre_filter=None): super().__init__(root, transform, pre_transform, pre_filter) self.load(self.processed_paths[0]) @property def raw_file_names(self): # Files in raw_dir that must exist to skip download() return ['data.csv'] @property def processed_file_names(self): # Files in processed_dir that must exist to skip process() return ['data.pt'] def download(self): # Download raw files to self.raw_dir # Use trusted sources only; verify checksums or signatures before loading. download_url('https://example.com/data.csv', self.raw_dir) def process(self): # Read raw data and create a list of Data objects data_list = [...] if self.pre_filter is not None: data_list = [d for d in data_list if self.pre_filter(d)] if self.pre_transform is not None: data_list = [self.pre_transform(d) for d in data_list] # save() collates list into one big Data + slices dict, then saves self.save(data_list, self.processed_paths[0]) ``` **Directory structure created automatically:** ``` root/ ├── raw/ # raw_dir — downloaded files go here │ └── data.csv └── processed/ # processed_dir — processed .pt files go here └── data.pt ``` **Key behaviors:** - `download()` runs only if files in `raw_file_names` are missing from `raw_dir` - `process()` runs only if files in `processed_file_names` are missing from `processed_dir` - If you change `pre_transform`, delete the `processed/` directory to reprocess ## Dataset (doesn't fit in RAM) For very large datasets, save each graph individually: ```python import os.path as osp import torch from torch_geometric.data import Dataset, download_url class LargeDataset(Dataset): def __init__(self, root, transform=None, pre_transform=None): super().__init__(root, transform, pre_transform) @property def raw_file_names(self): return ['graph_data.csv'] @property def processed_file_names(self): return [f'data_{i}.pt' for i in range(1000)] def download(self): download_url('...', self.raw_dir) def process(self): for idx in range(1000): data = Data(...) # Build graph from raw data if self.pre_filter is not None and not self.pre_filter(data): continue if self.pre_transform is not None: data = self.pre_transform(data) torch.save(data, osp.join(self.processed_dir, f'data_{idx}.pt')) def len(self): return 1000 def get(self, idx): return torch.load(osp.join(self.processed_dir, f'data_{idx}.pt')) ``` ## Loading Graphs from CSV A common pattern: load node/edge data from CSV files into a HeteroData object. ### Step 1: Load node features ```python import pandas as pd import torch def load_node_csv(path, index_col, encoders=None): df = pd.read_csv(path, index_col=index_col) # Map original IDs to consecutive 0..N-1 indices mapping = {idx: i for i, idx in enumerate(df.index.unique())} x = None if encoders is not None: xs = [encoder(df[col]) for col, encoder in encoders.item