
Imaging Data Commons
Query IDC medical imaging cohorts in BigQuery when idc-index cannot expose full DICOM tags, segmentations, or structured report measurements.
Overview
Imaging Data Commons is an agent skill for the Build phase that teaches advanced Google BigQuery access patterns for IDC medical imaging when idc-index metadata is insufficient.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill imaging-data-commonsWhat is this skill?
- Documents when to prefer BigQuery over idc-index for IDC v23 workloads
- Covers full DICOM metadata (4000+ tags), nested sequences, and private elements
- Queries segmentations table for per-segment anatomy codes beyond seg_index series metadata
- Surfaces quantitative and qualitative DICOM SR measurements without downloading SR files
- Prerequisites: Google Cloud billing, ADC auth, and google-cloud-bigquery or console access
- 4000+ DICOM tags via BigQuery vs ~50 in idc-index mini-index
- IDC data version v23 referenced as tested baseline
- First 1 TB/month BigQuery query volume noted as free tier on GCP billing
Adoption & trust: 515 installs on skills.sh; 27.6k GitHub stars; 1/3 security scanners passed (skills.sh audits).
What problem does it solve?
You can list IDC series with idc-index but cannot filter by segment anatomy codes, nested DICOM fields, or SR measurements without heavy downloads.
Who is it for?
Indie imaging researchers or ML builders wiring GCP BigQuery into reproducible cohort queries on IDC v23.
Skip if: Builders who only need quick IDC lookups or downloads—stick to idc-index and skip GCP billing setup.
When should I use this skill?
You need full IDC DICOM metadata, segmentation segment-level detail, DICOM SR measurements, or complex clinical joins that idc-index cannot answer.
What do I get? / Deliverables
After following the guide you can run targeted BigQuery SQL against IDC public datasets for full metadata joins and measurement fields while keeping idc-index for routine indexing.
- BigQuery SQL patterns for IDC datasets
- Auth and dataset navigation checklist for IDC on GCP
Recommended Skills
Journey fit
Advanced IDC access is a build-time integration step for imaging ML pipelines and cohort assembly, not a launch or growth activity. BigQuery joins and DICOM SR/segmentation tables are external data integrations layered onto research or product backends.
How it compares
Complements idc-index as the advanced SQL layer, not a replacement CLI downloader.
Common Questions / FAQ
Who is imaging-data-commons for?
Solo builders and small teams doing medical imaging research or ML on IDC who need SQL-level access to DICOM and clinical tables.
When should I use imaging-data-commons?
Use it during Build integrations when you need full DICOM tags, segmentation anatomy codes, DICOM SR quantitative or qualitative measurements, or joins idc-index cannot express.
Is imaging-data-commons safe to install?
Treat it as documentation for querying public IDC BigQuery data; review the Security Audits panel on this page before granting agents GCP credentials or billing access.
SKILL.md
READMESKILL.md - Imaging Data Commons
# BigQuery Guide for IDC **Tested with:** IDC data version v23 For most queries and downloads, use `idc-index` (see main SKILL.md). This guide covers BigQuery for advanced use cases requiring full DICOM metadata or complex joins. ## Prerequisites **Requirements:** 1. Google account 2. Google Cloud project with billing enabled (first 1 TB/month free) 3. `google-cloud-bigquery` Python package or BigQuery console access **Authentication setup:** ```bash # Install Google Cloud SDK, then: gcloud auth application-default login ``` ## When to Use BigQuery Use BigQuery instead of `idc-index` when you need: - Full DICOM metadata (all 4000+ tags, not just the ~50 in idc-index) - Complex joins across clinical data tables - DICOM sequence attributes (nested structures) - Queries on fields not in the idc-index mini-index - Private DICOM elements (vendor-specific tags in OtherElements column) - **Per-segment detail from DICOM Segmentation objects** — `idc-index` `seg_index` gives series-level metadata, but not individual segment anatomy codes; use `segmentations` BigQuery table to query by structure name - **Quantitative measurements from DICOM SR** — radiomics features (volume, diameter, shape descriptors) without downloading and parsing SR files; no idc-index equivalent - **Qualitative measurements from DICOM SR** — coded evaluations (malignancy rating, texture, margin) without parsing SR files; no idc-index equivalent ## Accessing IDC in BigQuery ### Dataset Structure All IDC tables are in the `bigquery-public-data` BigQuery project. **Current version (recommended for exploration):** - `bigquery-public-data.idc_current.*` - `bigquery-public-data.idc_current_clinical.*` **Versioned datasets (recommended for reproducibility):** - `bigquery-public-data.idc_v{IDC version}.*` - `bigquery-public-data.idc_v{IDC version}_clinical.*` Always use versioned datasets for reproducible research! ## Key Tables ### dicom_all Primary table joining complete DICOM metadata with IDC-specific columns (collection_id, gcs_url, license). Contains all DICOM tags from `dicom_metadata` plus collection and administrative metadata. See [dicom_all.sql](https://github.com/ImagingDataCommons/etl_flow/blob/master/bq/generate_tables_and_views/derived_tables/BQ_Table_Building/derived_data_views/sql/dicom_all.sql) for the exact derivation. ```sql SELECT collection_id, PatientID, StudyInstanceUID, SeriesInstanceUID, Modality, BodyPartExamined, SeriesDescription, gcs_url, license_short_name FROM `bigquery-public-data.idc_current.dicom_all` WHERE Modality = 'CT' AND BodyPartExamined = 'CHEST' LIMIT 10 ``` ### Derived Tables These tables are derived from DICOM objects (Segmentation and Structured Report) and have **no equivalent in idc-index**. Use them to query per-segment anatomy, radiomics features, and qualitative assessments without downloading DICOM files. **segmentations** — one row per segment within a DICOM SEG object. Lets you search by anatomical structure name or DICOM coded concept. The `idc-index` `seg_index` gives series-level metadata; this table gives per-segment detail. **measurement_groups** — one row per SR TID1500 measurement group. The parent grouping for quantitative and qualitative measurements; links measurements to segmentations and source images. **quantitative_measurements** — one row per numeric measurement within an SR TID1500 group. Contains radiomics features (volume, diameter, shape descriptors, texture) extracted from DICOM SR without downloading or parsing SR files. **qualitative_measurements** — one row per coded evaluation within an SR TID1500 group. Contains assessed findings (malignancy likelihood, texture, margin type) using coded concept values. See the [Derived Tables: Detailed Documentation](#derived-tables-detailed-documentation) section below for schemas, column descriptions, and query examples. ### Collection Metadata **original_collections_metadata** - Collection-level descriptions `