Pyhealth

Name: Pyhealth
Author: k-dense-ai

k-dense-ai/scientific-agent-skills

847 installs
32k repo stars
Updated July 29, 2026
k-dense-ai/scientific-agent-skills

pyhealth is a scientific agent skill that scaffolds complete clinical deep-learning pipelines with the PyHealth library for EHR, signal, and imaging healthcare ML features.

About

pyhealth is a clinical machine-learning skill for building healthcare prediction pipelines with PyHealth inside AI coding agents. It covers loading EHR and clinical datasets such as MIMIC-III, MIMIC-IV, eICU, OMOP, SleepEDF, ChestXray14, and EHRShot, then defining tasks like mortality, readmission, length-of-stay, drug recommendation, sleep staging, ICD coding, and EEG events. The skill instantiates models including Transformer, RETAIN, GAMENet, SafeDrug, MICRON, StageNet, AdaCare, and CNN, RNN, or MLP variants, trains with the PyHealth Trainer, computes clinical metrics, and uses ICD, ATC, NDC, and RxNorm code utilities. Developers reach for pyhealth when implementing regulated healthcare ML features rather than generic tabular models.

Implements the canonical 5-stage pipeline: Dataset → Task → Model → Trainer → Metrics
Loads 10+ major clinical datasets including MIMIC-III/IV, eICU, OMOP, SleepEDF, and ChestXray14
Supports clinical tasks such as mortality prediction, readmission, length-of-stay, drug recommendation, sleep staging, a
Instantiates specialized models including Transformer, RETAIN, GAMENet, SafeDrug, MICRON, StageNet, AdaCare, plus standa
Provides medical code utilities for ICD, ATC, NDC, and RxNorm lookup with cross-mapping

Pyhealth by the numbers

847 all-time installs (skills.sh)
+38 installs in the week ending Jul 29, 2026 (Skillselion tracking)
Ranked #338 of 2,065 Data Science & ML skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 29, 2026 (Skillselion catalog sync)

npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill pyhealth

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/k-dense-ai/scientific-agent-skills/pyhealth.svg)](https://skillselion.com/skills/k-dense-ai/scientific-agent-skills/pyhealth)

Installs	847
repo stars	★ 32k
Security audit	3 / 3 scanners passed
Last updated	July 29, 2026
Repository	k-dense-ai/scientific-agent-skills ↗

How do you build clinical ML pipelines with PyHealth?

Instantly scaffold complete clinical deep-learning pipelines using the PyHealth library when building healthcare ML features.

Who is it for?

Healthcare ML engineers building EHR, signal, or imaging prediction features with MIMIC, eICU, OMOP, or PyHealth-native datasets.

Skip if: Developers working on non-clinical tabular or NLP apps without EHR codes or PyHealth datasets should skip pyhealth.

When should I use this skill?

The user mentions PyHealth, MIMIC, eICU, OMOP, EHR modeling, clinical prediction, drug recommendation, sleep staging, or ICD coding.

What you get

PyHealth dataset loaders, task definitions, model instances, Trainer configurations, clinical metric reports, and medical code mappings.

Clinical ML training pipeline
PyHealth model and task configuration
Clinical metrics and code-mapping utilities

By the numbers

Covers 7+ named clinical datasets including MIMIC-III/IV, eICU, and OMOP
Documents 10+ clinical models including RETAIN, GAMENet, SafeDrug, and StageNet

Files

SKILL.mdMarkdownGitHub ↗

PyHealth

PyHealth (https://pyhealth.dev/) is a Python toolkit for clinical deep learning. It provides a unified, modular pipeline across electronic health records (EHR), physiological signals, and medical imaging.

The library is built around a 5-stage pipeline — Dataset → Task → Model → Trainer → Metrics — where each stage is replaceable and the interfaces between stages are stable. Code that follows this pipeline shape composes well; code that bypasses it usually fights the library.

When to use this skill

Use this skill whenever the user is doing clinical/healthcare ML and any of the following are true:

They mention PyHealth, MIMIC-III/IV, eICU, OMOP-CDM, EHRShot, SleepEDF, SHHS, ISRUC, COVID19-CXR, ChestX-ray14, TUEV/TUAB.
They want to predict mortality, readmission, length of stay, drug recommendations, sleep stages, ICD codes, EEG events, or de-identification.
They need to look up or cross-map medical codes (ICD-9-CM, ICD-10-CM, ATC, NDC, RxNorm, CCS).
They have EHR-shaped data and want to train a clinical model without writing the plumbing themselves.

PyHealth is the right tool when the workflow fits its 5 stages. If the user just wants generic PyTorch on tabular data, this skill is not necessary.

Installation (uv)

PyHealth 2.0 requires Python ≥ 3.12, < 3.14. Use uv for environment management — it's faster and reproducible.

# Create a project with the right Python
uv init my-pyhealth-project
cd my-pyhealth-project
uv python pin 3.12

# Add PyHealth (this also pulls in PyTorch and friends)
uv add pyhealth

# Run scripts inside the env
uv run python train.py

For a one-off script without a project, use uv run --with pyhealth python script.py. For the legacy 1.x line (Python 3.9+), uv add pyhealth==1.16. Detailed install notes, MIMIC access, and GPU/CPU device tips are in references/installation.md.

The 5-stage pipeline

A complete pipeline is typically <20 lines. This is the canonical shape — start here and modify pieces:

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
from pyhealth.metrics.binary import binary_metrics_fn

# 1. Dataset — raw patient registry
base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
)

# 2. Task — converts patients into supervised samples
samples = base.set_task(MortalityPredictionMIMIC3())

# 3. Split + DataLoaders (split by patient to avoid leakage)
train_ds, val_ds, test_ds = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)
val_loader   = get_dataloader(val_ds,   batch_size=32, shuffle=False)
test_loader  = get_dataloader(test_ds,  batch_size=32, shuffle=False)

# 4. Model — must be passed the SampleDataset, not the BaseDataset
model = Transformer(dataset=samples)

# 5. Train + evaluate
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc",
)

y_true, y_prob, _ = trainer.inference(test_loader)
print(binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc"]))

A copy-pasteable starter is in assets/starter_pipeline.py.

Critical things to get right

These are the mistakes that PyHealth code most commonly trips on. Internalize them before writing pipelines:

1. Models take a `SampleDataset`, not a `BaseDataset`. MIMIC3Dataset(...) returns a BaseDataset (a queryable patient registry). Only after .set_task(task) do you get a SampleDataset, which is what models, splitters, and DataLoaders expect. If you pass base to a model, it will fail or behave wrong.

2. Always split by patient (or visit), not by sample. Random sample-level splits leak information across train/test because the same patient can appear in both. Use split_by_patient for patient-level prediction, split_by_visit only when visits are independent.

3. Match the task to the dataset. Tasks are dataset-specific: MortalityPredictionMIMIC3 won't work on MIMIC-IV — use MortalityPredictionMIMIC4 or InHospitalMortalityMIMIC4. The full mapping is in references/tasks.md.

4. Pick `monitor` to match the task type. For binary classification use "pr_auc" or "roc_auc". For multilabel (drug rec) use "pr_auc_samples" or "jaccard_samples". For multiclass use "accuracy" or "f1_macro". Wrong monitor → checkpoint selection saves the wrong epoch.

5. MIMIC-IV uses `ehr_root=`, not `root=`. This is the one inconsistency in the dataset constructors.

6. For reproducible work, point `cache_dir=` somewhere persistent. PyHealth caches the parsed dataset; without cache_dir, you re-parse every run.

How to use this skill

PyHealth has a large API surface — there's no point loading it all at once. Read the reference file that matches the user's task:

If the user is asking about…	Read
Installing, env setup, MIMIC access, GPU	`references/installation.md`
Which dataset class to use, loading patterns, splitting	`references/datasets.md`
What prediction task to choose (mortality, readmission, drug rec, sleep…)	`references/tasks.md`
Picking a model architecture, model-specific arguments	`references/models.md`
Looking up or cross-mapping ICD/ATC/NDC/RxNorm/CCS codes, tokenizers	`references/medcode.md`
End-to-end recipes for common scenarios	`references/examples.md`

For multi-step tasks (e.g., "build a drug recommendation pipeline on MIMIC-IV"), read tasks.md + models.md + examples.md together — they cross-reference each other.

A note on style

Write minimal, idiomatic PyHealth. The library is opinionated; lean into its abstractions instead of reimplementing them in raw PyTorch. If you find yourself writing a custom training loop, ask whether Trainer would do the job — it almost always will, and it handles checkpointing, logging, and best-model selection for free.

When the user has private MIMIC access, point them at the local CSV root; for demos and learning, the synthetic MIMIC-III bucket (https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/) is fine and works without credentialing.

"""
PyHealth starter pipeline. Replace the four marked lines for a different
dataset/task/model/monitor. Everything else stays the same.

Run:
    uv run python starter_pipeline.py
"""

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
from pyhealth.metrics.binary import binary_metrics_fn


# ---- 1. Dataset ----------------------------------------------------------
# Swap MIMIC3Dataset for MIMIC4Dataset / eICUDataset / OMOPDataset / etc.
# For MIMIC-IV use ehr_root= instead of root=.
base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    cache_dir="./cache/mimic3",
)

# ---- 2. Task -------------------------------------------------------------
# Match the suffix to the dataset (MIMIC3 / MIMIC4 / EICU / OMOP).
task = MortalityPredictionMIMIC3()
samples = base.set_task(task)

# ---- 3. Split + DataLoaders ---------------------------------------------
# Always split_by_patient for clinical prediction to avoid patient leakage.
train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train, batch_size=32, shuffle=True)
val_loader = get_dataloader(val, batch_size=32, shuffle=False)
test_loader = get_dataloader(test, batch_size=32, shuffle=False)

# ---- 4. Model ------------------------------------------------------------
# Swap for RETAIN / RNN / GAMENet / SafeDrug / StageNet / etc.
# The model MUST receive the SampleDataset (`samples`), not the BaseDataset.
model = Transformer(dataset=samples)

# ---- 5. Trainer ----------------------------------------------------------
# monitor:
#   binary       -> "pr_auc" or "roc_auc"
#   multiclass   -> "accuracy" / "f1_macro" / "cohen_kappa"
#   multilabel   -> "pr_auc_samples" / "jaccard_samples"
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc",
    patience=5,
)

# ---- 6. Evaluate ---------------------------------------------------------
y_true, y_prob, _ = trainer.inference(test_loader)
print(binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc", "f1"]))

Datasets

PyHealth datasets are queryable patient registries, not PyTorch Datasets. The PyTorch-compatible object is the SampleDataset returned by base.set_task(task). Don't try to index BaseDataset like a list — it won't work.

Two-tier object model

BaseDataset                         SampleDataset
├── parses raw CSVs                 ├── one row per supervised sample
├── one row per patient             ├── indexable, length-ed
├── .set_task(task) → SampleDataset ├── feeds into get_dataloader(...)
├── .get_patient(id) → Patient      └── feeds into Model(dataset=...)
└── .iter_patients() → iterator

Always go BaseDataset → set_task → SampleDataset before doing anything else.

EHR / clinical datasets

Class	Import	Constructor signature highlights
`MIMIC3Dataset`	`from pyhealth.datasets import MIMIC3Dataset`	`root, tables, cache_dir=None, dev=False, num_workers=...`
`MIMIC4Dataset`	`from pyhealth.datasets import MIMIC4Dataset`	`ehr_root, tables, ...` (note: `ehr_root`, not `root`)
`eICUDataset`	`from pyhealth.datasets import eICUDataset`	`root, tables, ...`
`OMOPDataset`	`from pyhealth.datasets import OMOPDataset`	`root, tables, ...`
`EHRShotDataset`	`from pyhealth.datasets import EHRShotDataset`	few-shot benchmark
`Support2Dataset`	`from pyhealth.datasets import Support2Dataset`	palliative care outcomes
`MIMICExtractDataset`	`from pyhealth.datasets import MIMICExtractDataset`	pre-processed MIMIC

Common MIMIC tables

MIMIC-III (uppercase): DIAGNOSES_ICD, PROCEDURES_ICD, PRESCRIPTIONS, LABEVENTS, NOTEEVENTS
MIMIC-IV (lowercase): diagnoses_icd, procedures_icd, prescriptions, labevents

MIMIC-III example

from pyhealth.datasets import MIMIC3Dataset

base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    cache_dir="./cache/mimic3",
    dev=False,
)

MIMIC-IV example

from pyhealth.datasets import MIMIC4Dataset

base = MIMIC4Dataset(
    ehr_root="/path/to/mimic-iv-2.2/hosp",      # NOT root=
    tables=["diagnoses_icd", "procedures_icd", "prescriptions"],
    cache_dir="./cache/mimic4",
)

Signal / sleep datasets

Class	Use
`SleepEDFDataset`	Sleep-EDF polysomnography → sleep stage classification
`SHHSDataset`	Sleep Heart Health Study EEG
`ISRUCDataset`	ISRUC sleep dataset
`TUABDataset`	Temple University abnormal EEG
`TUEVDataset`	Temple University EEG events
`CardiologyDataset`	ECG / cardiology recordings
`DREAMTDataset`, `BMDHSDataset`	Sleep / respiratory recordings

Imaging datasets

Class	Use
`COVID19CXRDataset`	COVID-19 chest X-ray classification
`ChestXray14Dataset`	NIH ChestX-ray14, multi-label
`PhysioNetDeIDDataset`	De-identified clinical notes

Genomics datasets

Class	Use
`ClinVarDataset`	Variant pathogenicity classification
`COSMICDataset`	Mutation pathogenicity
`TCGAPRADDataset`	Cancer survival, mutation burden

Text dataset

Class	Use
`MedicalTranscriptionsDataset`	Clinical transcription category classification

Splitting and DataLoaders

After set_task, split and wrap in DataLoaders. Always split by patient (not by sample) for clinical prediction — random sample splits leak the same patient into train and test.

from pyhealth.datasets import split_by_patient, split_by_visit, get_dataloader

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])

train_loader = get_dataloader(train, batch_size=32, shuffle=True)
val_loader   = get_dataloader(val,   batch_size=32, shuffle=False)
test_loader  = get_dataloader(test,  batch_size=32, shuffle=False)

Use split_by_visit only when visits are independent (rare — most clinical tasks need patient-level splits). For time-aware evaluation, use split_by_patient with chronological cutoffs from a custom task.

Inspecting a dataset

base.stats()                          # summary printout
patient = base.get_patient("p001")    # Patient object
events = patient.get_events()         # all events for that patient

for p in base.iter_patients():        # iterate without loading all into memory
    ...

len(samples)                          # only valid AFTER set_task
samples[0]                            # dict of features + label for one sample

Custom datasets

Subclass BaseDataset if the user has a non-standard EHR source. They must implement parsing of patients/events; set_task then works as usual. This is more involved than picking a built-in dataset — only suggest it when nothing else fits.

End-to-end recipes

These are complete pipelines for the most common scenarios. Copy, then modify the dataset/task/model/monitor lines for the user's situation. All examples assume uv add pyhealth has been run.

1. Mortality prediction on MIMIC-III (binary)

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
from pyhealth.metrics.binary import binary_metrics_fn

base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
    cache_dir="./cache/mimic3",
)
samples = base.set_task(MortalityPredictionMIMIC3())

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train, batch_size=32, shuffle=True)
val_loader   = get_dataloader(val,   batch_size=32, shuffle=False)
test_loader  = get_dataloader(test,  batch_size=32, shuffle=False)

model = Transformer(dataset=samples)
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=50,
    monitor="pr_auc",
    patience=5,
)

y_true, y_prob, _ = trainer.inference(test_loader)
print(binary_metrics_fn(y_true, y_prob, metrics=["pr_auc", "roc_auc", "f1"]))

2. Readmission prediction on MIMIC-IV with RETAIN (interpretable)

Use RETAIN when the user wants to explain predictions, not just make them.

from pyhealth.datasets import MIMIC4Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import ReadmissionPredictionMIMIC4
from pyhealth.models import RETAIN
from pyhealth.trainer import Trainer

base = MIMIC4Dataset(
    ehr_root="/path/to/mimic-iv/hosp",   # ehr_root, not root
    tables=["diagnoses_icd", "procedures_icd", "prescriptions"],
    cache_dir="./cache/mimic4",
)
samples = base.set_task(ReadmissionPredictionMIMIC4())

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train, batch_size=32, shuffle=True)
val_loader   = get_dataloader(val,   batch_size=32, shuffle=False)
test_loader  = get_dataloader(test,  batch_size=32, shuffle=False)

model = RETAIN(dataset=samples)
trainer = Trainer(model=model, metrics=["roc_auc", "pr_auc", "f1"])
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=30,
    monitor="roc_auc",
)
print(trainer.evaluate(test_loader))

3. Drug recommendation on MIMIC-III with SafeDrug (multilabel)

Drug rec is multilabel — every visit has a set of drugs. Use a _samples monitor.

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import DrugRecommendationMIMIC3
from pyhealth.models import SafeDrug
from pyhealth.trainer import Trainer

base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "PRESCRIPTIONS"],
)
samples = base.set_task(DrugRecommendationMIMIC3())

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train, batch_size=64, shuffle=True)
val_loader   = get_dataloader(val,   batch_size=64, shuffle=False)
test_loader  = get_dataloader(test,  batch_size=64, shuffle=False)

model = SafeDrug(dataset=samples)
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=30,
    monitor="pr_auc_samples",     # multilabel — note _samples suffix
)
print(trainer.evaluate(test_loader))

4. Length-of-stay (multiclass) baseline

from pyhealth.datasets import MIMIC3Dataset, split_by_patient, get_dataloader
from pyhealth.tasks import LengthOfStayPredictionMIMIC3
from pyhealth.models import RNN
from pyhealth.trainer import Trainer

base = MIMIC3Dataset(
    root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/",
    tables=["DIAGNOSES_ICD", "PROCEDURES_ICD"],
)
samples = base.set_task(LengthOfStayPredictionMIMIC3())

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
loaders = [get_dataloader(d, batch_size=32, shuffle=s)
           for d, s in [(train, True), (val, False), (test, False)]]

model = RNN(dataset=samples, rnn_type="GRU", hidden_dim=128)
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=loaders[0],
    val_dataloader=loaders[1],
    epochs=30,
    monitor="cohen_kappa",
)
print(trainer.evaluate(loaders[2]))

5. Sleep staging on Sleep-EDF (multiclass on signals)

from pyhealth.datasets import SleepEDFDataset, split_by_patient, get_dataloader
from pyhealth.tasks import SleepStagingSleepEDF
from pyhealth.models import SparcNet
from pyhealth.trainer import Trainer

base = SleepEDFDataset(root="/path/to/sleepedf", cache_dir="./cache/sleepedf")
samples = base.set_task(SleepStagingSleepEDF())

train, val, test = split_by_patient(samples, [0.8, 0.1, 0.1])
train_loader = get_dataloader(train, batch_size=128, shuffle=True)
val_loader   = get_dataloader(val,   batch_size=128, shuffle=False)
test_loader  = get_dataloader(test,  batch_size=128, shuffle=False)

model = SparcNet(dataset=samples)
trainer = Trainer(model=model)
trainer.train(
    train_dataloader=train_loader,
    val_dataloader=val_loader,
    epochs=20,
    monitor="cohen_kappa",
)
print(trainer.evaluate(test_loader))

6. Code lookup + cross-mapping (no model)

When the user wants help interpreting codes or reducing label cardinality, no training is needed:

from pyhealth.medcode import InnerMap, CrossMap

icd9 = InnerMap.load("ICD9CM")
print(icd9.lookup("428.0"))   # 'Congestive heart failure, unspecified'

# Roll up MIMIC-III ICD-9 diagnoses to CCS for a smaller label space
icd9_to_ccs = CrossMap.load("ICD9CM", "CCSCM")
ccs_codes = icd9_to_ccs.map("428.0")   # ['108']

7. Logistic regression baseline (always run this first)

Before reaching for a Transformer, run a logistic-regression baseline. It's fast, hard to misuse, and tells you whether the task signal exists at all.

from pyhealth.models import LogisticRegression
from pyhealth.trainer import Trainer

model = LogisticRegression(dataset=samples)
trainer = Trainer(model=model)
trainer.train(train_dataloader=train_loader, val_dataloader=val_loader, epochs=10, monitor="pr_auc")

If LR gets PR-AUC of 0.5, deeper models likely won't help — investigate the task or features. If LR is already strong, the headroom for fancy models is small.

8. Loading a checkpoint and predicting

from pyhealth.trainer import Trainer
from pyhealth.models import Transformer

model = Transformer(dataset=samples)
trainer = Trainer(model=model)
trainer.load_ckpt("./output/best.ckpt")

y_true, y_prob, loss = trainer.inference(test_loader)

9. Custom task on MIMIC-III

When no built-in task fits — e.g., the user wants to predict a specific lab value 24h ahead:

from pyhealth.tasks import BaseTask
from pyhealth.datasets import MIMIC3Dataset

class HighCreatininePrediction(BaseTask):
    task_name = "HighCreatininePrediction"
    input_schema = {"diagnoses": "sequence", "procedures": "sequence"}
    output_schema = {"label": "binary"}

    def __call__(self, patient):
        samples = []
        for visit in patient.visits[:-1]:
            next_visit = patient.next_visit(visit)
            label = self._has_high_creatinine(next_visit)
            samples.append({
                "patient_id": patient.patient_id,
                "visit_id": visit.visit_id,
                "diagnoses": visit.get_code_list("DIAGNOSES_ICD"),
                "procedures": visit.get_code_list("PROCEDURES_ICD"),
                "label": int(label),
            })
        return samples

    def _has_high_creatinine(self, visit): ...

base = MIMIC3Dataset(root=..., tables=["DIAGNOSES_ICD", "PROCEDURES_ICD", "LABEVENTS"])
samples = base.set_task(HighCreatininePrediction())

The exact Patient/Visit API varies — read help(patient) interactively if the user is on a custom dataset.

Installation & Environment Setup

Python version

PyHealth 2.0 requires Python 3.12 or 3.13 (>=3.12,<3.14). The 1.x line supports Python 3.9+ if a downgrade is unavoidable.

Recommended: uv

uv is the right tool here — it resolves and installs an order of magnitude faster than pip, and the lockfile makes runs reproducible across machines.

New project

uv init my-pyhealth-project
cd my-pyhealth-project
uv python pin 3.12          # writes .python-version
uv add pyhealth             # resolves PyTorch + transitive deps, writes uv.lock
uv run python train.py      # runs inside the project venv

Existing project

If a pyproject.toml already exists:

uv add pyhealth

If only requirements.txt exists, either migrate to pyproject.toml (preferred) or:

uv pip install pyhealth

One-off scripts (no project)

uv run --with pyhealth python script.py

This creates an ephemeral environment, runs the script, and disposes the env. Good for quick experiments.

Legacy 1.x

uv add 'pyhealth==1.16'     # last 1.x release, Python 3.9+

The 1.x and 2.x APIs differ — examples in this skill target 2.x. If a user is on 1.x, mention the version mismatch before debugging.

GPU / CPU

PyHealth uses PyTorch under the hood. uv add pyhealth pulls the default PyTorch wheel, which is CPU-only on macOS and CUDA-enabled on Linux when CUDA is detected.

For explicit CUDA control on Linux:

# Replace cu121 with the user's CUDA version
uv add 'torch>=2.1' --index https://download.pytorch.org/whl/cu121
uv add pyhealth

For Apple Silicon, the default wheel works and uses MPS automatically when Trainer(device="mps") is set. CPU is the safe default if device behavior is unclear.

Dataset access

Synthetic MIMIC-III (no credentials)

PyHealth hosts a synthetic copy on Google Cloud Storage that any pipeline can hit directly:

root="https://storage.googleapis.com/pyhealth/Synthetic_MIMIC-III/"

Use this for demos, tutorials, and any code that needs to run without PhysioNet credentials.

Real MIMIC-III / MIMIC-IV / eICU

These require completed CITI training and a credentialed PhysioNet account. Once downloaded, point root= (or ehr_root= for MIMIC-IV) at the local directory containing the CSV/CSV.gz files:

MIMIC4Dataset(
    ehr_root="/path/to/mimic-iv/2.2/hosp",   # not `root`
    tables=["diagnoses_icd", "procedures_icd", "prescriptions"],
    cache_dir="/path/to/cache",              # cache parsed output
)

OMOP-CDM

Standardized schema; point root= at the directory containing CDM tables (person.csv, condition_occurrence.csv, etc.).

Caching

The first call to set_task() is expensive (parses every CSV, applies the task to every patient). Set cache_dir= on the dataset constructor to persist the parsed result:

MIMIC3Dataset(root=..., tables=..., cache_dir="./cache/mimic3")

Subsequent runs reload from disk in seconds. Without cache_dir, every run re-parses from scratch — fine for a one-off script, painful for iteration.

`dev=True`

All dataset constructors accept dev=True, which loads only a small subset of patients. Use this while iterating on pipeline shape; switch to dev=False (the default) once the pipeline runs end-to-end.

Common installation issues

"Could not find a version that satisfies the requirement pyhealth" — Python version is < 3.12. Run uv python pin 3.12 and reinstall.
CUDA OOM during `set_task` — set_task is CPU-only; this is almost always a Trainer issue. Reduce batch_size or move to CPU temporarily to localize the problem.
Slow first run — expected; set cache_dir= and re-run.
`KeyError` on table name — table names are case-sensitive and dataset-specific. MIMIC-III uses uppercase (DIAGNOSES_ICD), MIMIC-IV uses lowercase (diagnoses_icd). Check the user's dataset version.

Medical codes & tokenizers

PyHealth ships utilities for working with medical coding systems directly — no external API, just bundled mappings.

InnerMap: lookup within a coding system

InnerMap lets you look up code descriptions and traverse the code hierarchy (parents/ancestors).

from pyhealth.medcode import InnerMap

icd9cm = InnerMap.load("ICD9CM")
icd9cm.lookup("428.0")
# → 'Congestive heart failure, unspecified'

icd9cm.get_ancestors("428.0")
# → ['428', '420-429.99', '390-459.99', '001-999.99']

Supported coding systems:

System	Domain
`ICD9CM`, `ICD10CM`	Diagnoses
`ICD9PROC`, `ICD10PCS`	Procedures
`ATC`	WHO Anatomical Therapeutic Chemical (drugs)
`NDC`	National Drug Code (US)
`RxNorm`	Normalized drug names
`CCSCM`, `CCSPROC`	Clinical Classifications Software (single-level)

atc = InnerMap.load("ATC")
atc.lookup("M01AE51")
# → 'ibuprofen, combinations'

CrossMap: translate between systems

CrossMap converts codes from one system to another. Many mappings are one-to-many — the result is always a list.

from pyhealth.medcode import CrossMap

# Diagnoses: ICD-9-CM → CCS (rolls fine-grained codes up to ~280 categories)
cm = CrossMap.load("ICD9CM", "CCSCM")
cm.map("428.0")
# → ['108']

# Drugs: NDC → RxNorm (normalized drug name)
cm = CrossMap.load("NDC", "RxNorm")
cm.map("50580049698")
# → ['209387']

Common cross-mappings:

ICD9CM ↔ ICD10CM — ICD version conversion
ICD9CM → CCSCM, ICD10CM → CCSCM — dimensionality reduction (~14k → 280 codes)
NDC → RxNorm — drug normalization
NDC → ATC — pharmacology grouping
RxNorm → ATC — drug therapeutic classification

When to use cross-mapping: when the user has codes in one system but wants to predict or feature-engineer in another (e.g., training on ICD-9 from MIMIC-III but evaluating on ICD-10 from MIMIC-IV).

Tokenizer

pyhealth.tokenizer.Tokenizer converts code lists to integer indices and back. Most pipelines don't need to call it directly — set_task and the models handle tokenization internally — but it's exposed when you need batch encoding for custom models.

from pyhealth.tokenizer import Tokenizer

vocab = ['A01A', 'A02A', 'A02B', 'A03C', 'A03D', 'A04A']
tok = Tokenizer(tokens=vocab, special_tokens=["<pad>", "<unk>"])

# 2D = batch of code lists, one per sample
tokens = [['A03C', 'A03D'], ['A04A', 'B035']]   # 'B035' is OOV
indices = tok.batch_encode_2d(tokens)
# → [[5, 6], [7, 1]]    (1 = <unk>)

# 3D = batch of visits, each with code lists
tokens = [[['A03C', 'A03D'], ['A04A']], [['B035']]]
indices = tok.batch_encode_3d(tokens)

# Decode is symmetric
tok.batch_decode_2d(indices)

Reserved indices: 0 = <pad>, 1 = <unk> when both special tokens are passed (in that order).

When to surface this to the user

Reduce label cardinality: ICD-9 → CCS turns 14,000 sparse labels into 280 — drug-rec and ICD-coding tasks often benefit.
Cross-version compatibility: training on MIMIC-III (ICD-9) and inferring on MIMIC-IV (ICD-10) requires a cross-map.
Drug normalization: NDC codes are vendor-specific; map to RxNorm or ATC for stable features.
Interpretability: after a prediction, use InnerMap.lookup to render code IDs as human-readable descriptions in the output.

Models

All PyHealth models are PyTorch modules with a unified constructor: they take a SampleDataset (the output of base.set_task(...)) as the first argument, plus model-specific hyperparameters. The model auto-configures input/output dimensions from the dataset's schema — you don't wire layers by hand.

model = Transformer(dataset=samples, hidden_dim=128)

If you pass a BaseDataset instead of a SampleDataset, the model can't introspect schemas and will error or misbehave.

Choosing a model

Pick by data shape and task type, not by recency. The "newest" model is rarely the right answer.

EHR sequential codes (diagnoses, procedures, prescriptions across visits)

Model	When to pick it
`Transformer`	Strong default. Long visit histories, attention over codes.
`RNN` (LSTM/GRU)	Smaller datasets; faster than Transformer; sensible baseline.
`RETAIN`	When interpretability matters — produces visit-level and code-level attention weights.
`Deepr`	CNN-over-codes; readmission-style tasks.
`TCN`	Long-range temporal patterns where causality matters.
`AdaCare`	Adaptive feature extraction across irregular time intervals.
`ConCare`	Contextualized representations across visits.
`StageNet`	Disease-progression staging from irregular vitals.
`EHRMamba`	State-space alternative to Transformer for long sequences.

Drug recommendation (multilabel)

Model	When to pick it
`GAMENet`	Drug-rec baseline with memory networks; pairs with `DrugRecommendation*` tasks.
`SafeDrug`	Models drug-drug interactions / safety constraints via molecular structure.
`MICRON`	Predicts medication change between visits, not the full set.
`MoleRec`	Substructure-aware molecular drug recommendation.

Static / tabular features

Model	When to pick it
`LogisticRegression`	Strong, fast baseline. Always run this first.
`MLP`	Static numeric vectors, no sequence order.

Imaging / signals

Model	When to pick it
`CNN`	Generic convolutional baseline for images and 1D signals.
`ContraWR`	Contrastive learning for biosignals.
`SparcNet`	Sparse signal prediction (seizure, sleep staging).
`BIOT`	Biosignal transformer.

Graph-structured data

Model	When to pick it
`GNN`	Generic graph neural net baseline.
`GraphCare`	EHR codes augmented with external medical knowledge graphs (UMLS/SNOMED).
`GRASP`	Patient-similarity graph representations.

Text

Model	When to pick it
`TransformersModel`	Pretrained HuggingFace transformer (BERT-family) — clinical notes, transcripts.
`TransformerDeID`	De-identification NER head on top of a transformer.
`MedLink`	Medical entity linking.

Generative / representation

Model	When to pick it
`VAE`	Synthetic EHR generation, anomaly detection.
`GAN`	Synthetic data with adversarial training.

Reinforcement learning

Model	When to pick it
`Agent`	Treatment recommendation framed as RL.

Multimodal

Model	When to pick it
`MultimodalRNN`	Mix of sequential codes and static tensors in one sample.

Common arguments

Most clinical models accept:

dataset — the SampleDataset (required, positional)
hidden_dim — embedding/hidden width (default ≈128)
embedding_dim — separate embedding width if exposed
dropout — dropout rate
num_layers — for RNN/Transformer/TCN

Refer to the docstring (help(Transformer)) for model-specific knobs (e.g., rnn_type for RNN, num_filters for CNN, latent_dim for VAE).

Recommended progression

When starting on a new task, work up the model ladder rather than jumping to the most exotic option:

1. `LogisticRegression` — sanity check + floor. 2. `MLP` if features are static, `RNN` if sequential. 3. `Transformer` — strong general default. 4. Specialized model (RETAIN, GAMENet, StageNet, etc.) — only if the task has a property that motivates it (interpretability, drug structure, irregular time, etc.).

Stop as soon as a model does the job. A working Transformer beats a half-debugged MoleRec.

Custom models

Subclass BaseModel if nothing fits. The dataset object provides feature extractors via dataset.input_processors — use them to keep tokenization consistent with the rest of the pipeline rather than rolling custom encoders.

Tasks

A task turns a BaseDataset (raw patients) into a SampleDataset (supervised samples). Tasks define input_schema (which fields go to the model) and output_schema (the label).

samples = base.set_task(MortalityPredictionMIMIC3())

Tasks are dataset-specific. Picking the wrong combo (e.g., MortalityPredictionMIMIC3 on a MIMIC-IV dataset) will fail. Match the suffix.

Task → Dataset compatibility matrix

Mortality prediction (binary)

Task class	Dataset
`MortalityPredictionMIMIC3`	MIMIC-III
`MortalityPredictionMIMIC4`	MIMIC-IV
`InHospitalMortalityMIMIC4`	MIMIC-IV (in-hospital, narrower than next-visit)
`MortalityPredictionEICU`, `MortalityPredictionEICU2`	eICU
`MortalityPredictionOMOP`	OMOP
`MortalityPredictionStageNetMIMIC4`	MIMIC-IV (paired with StageNet model)

Readmission prediction (binary)

Task class	Dataset
`ReadmissionPredictionMIMIC3`	MIMIC-III
`ReadmissionPredictionMIMIC4`	MIMIC-IV
`ReadmissionPredictionEICU`	eICU
`ReadmissionPredictionOMOP`	OMOP

Length-of-stay prediction (multiclass)

Task class	Dataset
`LengthOfStayPredictionMIMIC3`	MIMIC-III
`LengthOfStayPredictionMIMIC4`	MIMIC-IV
`LengthOfStayPredictioneICU`	eICU
`LengthOfStayPredictionOMOP`	OMOP

LOS is bucketed into discrete classes (e.g., <1 day, 1-2 days, …, >14 days). Treat as multiclass classification.

Drug recommendation (multilabel)

Task class	Dataset
`DrugRecommendationMIMIC3`	MIMIC-III
`DrugRecommendationMIMIC4`	MIMIC-IV
`DrugRecommendationEICU`	eICU

Multilabel = each visit has a set of drugs prescribed; predict the set. Use models with drug-aware structure (GAMENet, SafeDrug, MICRON, MoleRec) or fall back to Transformer / RNN.

Specialized clinical

Task class	What it predicts
`DKAPredictionMIMIC4`	Diabetic ketoacidosis risk
`MIMIC3ICD9Coding`	ICD-9 codes for a discharge note (multilabel)

Sleep & EEG

Task class	Dataset	Predicts
`SleepStagingSleepEDF`	SleepEDF	Sleep stage (multiclass)
`EEGEventsTUEV`	TUEV	EEG events
`EEGAbnormalTUAB`	TUAB	EEG abnormality (binary)

Imaging

Task class	Dataset	Predicts
`COVID19CXRClassification`	COVID19-CXR	COVID-19 (multiclass)
`ChestXray14BinaryClassification`	ChestX-ray14	Single-disease binary
`ChestXray14MultilabelClassification`	ChestX-ray14	Multi-disease multilabel
`cardiology_isAR_fn`, `_isBBBFB_fn`, `_isAD_fn`, `_isCD_fn`, `_isWA_fn`	Cardiology	Various ECG abnormalities

Text / NLP

Task class	Dataset	Predicts
`MedicalTranscriptionsClassification`	Medical Transcriptions	Specialty/category
`DeIDNERTask`	PhysioNet DeID	De-identification NER

Genomics

Task class	Dataset	Predicts
`VariantClassificationClinVar`	ClinVar	Variant pathogenicity
`MutationPathogenicityPrediction`	COSMIC	Mutation pathogenicity
`CancerSurvivalPrediction`	TCGA-PRAD	Cancer survival
`CancerMutationBurden`	TCGA-PRAD	Tumor mutation burden

Benchmarks

Task class	Use
`BenchmarkEHRShot`	Multi-task EHR few-shot benchmark on EHRShot

Picking the right `monitor` metric

The Trainer.train(monitor=...) argument decides which checkpoint gets saved. Match it to the task type:

Task type	Good `monitor` choices
Binary (mortality, readmission, EEG abnormal)	`"pr_auc"`, `"roc_auc"`, `"f1"`
Multiclass (LOS, sleep staging, COVID CXR)	`"accuracy"`, `"f1_macro"`, `"cohen_kappa"`
Multilabel (drug rec, ICD coding, ChestXray14)	`"pr_auc_samples"`, `"jaccard_samples"`, `"f1_samples"`

Mismatched monitor (e.g., "pr_auc" on a multiclass task) silently saves the wrong epoch.

Custom tasks

When no built-in task fits, subclass BaseTask:

from pyhealth.tasks import BaseTask

class MyTask(BaseTask):
    task_name = "MyTask"
    input_schema = {"diagnoses": "sequence", "procedures": "sequence"}
    output_schema = {"label": "binary"}

    def __call__(self, patient):
        # Iterate the patient's visits, decide which become samples,
        # extract features, compute the label, and return a list of dicts.
        samples = []
        for i, visit in enumerate(patient.visits):
            if i == len(patient.visits) - 1:
                continue  # need at least one future visit for the label
            samples.append({
                "patient_id": patient.patient_id,
                "visit_id": visit.visit_id,
                "diagnoses": visit.get_code_list("DIAGNOSES_ICD"),
                "procedures": visit.get_code_list("PROCEDURES_ICD"),
                "label": int(self._compute_label(patient, visit)),
            })
        return samples

    def _compute_label(self, patient, visit): ...

The __call__ is invoked once per patient. Returning [] for a patient excludes them from the SampleDataset. The schema strings ("sequence", "binary", "multilabel", "multiclass", "regression") tell PyHealth's processors how to handle each field.

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

How it compares

Choose pyhealth for EHR-centric clinical pipelines; use scanpy or torchdrug when the domain is single-cell RNA-seq or molecular graphs instead.

FAQ

Which datasets does pyhealth support?

pyhealth supports clinical datasets including MIMIC-III, MIMIC-IV, eICU, OMOP, SleepEDF, ChestXray14, and EHRShot for EHR, signal, and imaging pipelines inside PyHealth.

What clinical tasks can pyhealth scaffold?

pyhealth scaffolds mortality, readmission, length-of-stay, drug recommendation, sleep staging, ICD coding, and EEG event tasks, with models such as RETAIN, GAMENet, SafeDrug, and Transformer variants trained via PyHealth Trainer.

Is Pyhealth safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLautomationresearch