
Deeptools
Run standard deepTools BAM→bigWig, QC, and heatmap workflows for ChIP-seq and coverage analysis with correct normalization flags.
Overview
deepTools is an agent skill most often used in Build (also Validate and Operate) that supplies copy-paste deepTools CLI recipes for BAM QC, normalized coverage, differential tracks, and heatmaps.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill deeptoolsWhat is this skill?
- Four-step workflow: QC (plotFingerprint, plotCorrelation), coverage (bamCoverage), comparison (bamCompare), visualizatio
- bamCoverage RPGC normalization with hg38 effective genome size 2913022398
- bamCompare log2 ratio with readCount scale factors for treatment vs control
- TSS-centered heatmaps with ±3000 bp reference-point matrices
- Effective genome size table for human hg38, mouse mm10, and fly dm6
- 4-step typical workflow (QC → coverage → comparison → visualization)
- 3 reference effective genome sizes in table (hg38, mm10, dm6)
Adoption & trust: 520 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have BAM files and BED references but keep mis-setting normalization, genome size, or matrix parameters when generating bigWig and heatmaps.
Who is it for?
Bioinformatics-aware indie builders or researchers automating standard deepTools steps with an AI agent in a Linux or HPC environment.
Skip if: Beginners without BAM inputs, indexed references, or Python/R downstream plotting outside the listed CLI tools.
When should I use this skill?
You need deepTools shell commands for BAM QC, normalized coverage export, treatment/control comparison, or TSS-centered heatmaps with standard normalization methods.
What do I get? / Deliverables
Reproducible shell commands produce QC plots, normalized bigWig files, ratio tracks, and TSS heatmaps aligned to the skill’s four-stage workflow.
- bigWig or ratio tracks
- correlation and fingerprint PNGs
- matrix.gz and heatmap PNG from computeMatrix/plotHeatmap
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
First appears in Build when you implement genomics pipelines, but the same commands support validation QC and operational re-runs on new samples. Integrations subphase covers CLI binding to deepTools (bamCoverage, bamCompare, computeMatrix) inside reproducible analysis scripts.
Where it fits
Run plotCorrelation and plotFingerprint on a few pilot BAMs before committing to a full production pipeline.
Generate RPGC-normalized bigWig and log2 bamCompare tracks for treatment versus control for downstream IGV or web genome browsers.
Re-execute the documented QC→coverage→heatmap chain when new sequencing batches arrive with the same reference genome.
How it compares
Focused CLI cookbook for deepTools—not a full Snakemake/Nextflow orchestration or peak-calling replacement.
Common Questions / FAQ
Who is deeptools for?
Developers and computational biologists who want an agent to emit correct deepTools commands for ChIP-seq and coverage analysis without searching scattered forum posts.
When should I use deeptools?
In Validate when sanity-checking fingerprints and correlations on pilot BAMs; in Build when generating normalized tracks and heatmaps for a study; in Operate when re-running the same QC→coverage pipeline on new sequencing drops.
Is deeptools safe to install?
The skill is procedural documentation; running suggested commands executes local shell jobs on your BAMs—review the Security Audits panel on this page and only run on data you are allowed to process.
SKILL.md
READMESKILL.md - Deeptools
# deepTools Quick Reference ## Most Common Commands ### BAM to bigWig (normalized) ```bash bamCoverage --bam input.bam --outFileName output.bw \ --normalizeUsing RPGC --effectiveGenomeSize 2913022398 \ --binSize 10 --numberOfProcessors 8 ``` ### Compare two BAM files ```bash bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \ --operation log2 --scaleFactorsMethod readCount ``` ### Correlation heatmap ```bash multiBamSummary bins --bamfiles *.bam -o counts.npz plotCorrelation -in counts.npz --corMethod pearson \ --whatToShow heatmap -o correlation.png ``` ### Heatmap around TSS ```bash computeMatrix reference-point -S signal.bw -R genes.bed \ -b 3000 -a 3000 --referencePoint TSS -o matrix.gz plotHeatmap -m matrix.gz -o heatmap.png ``` ### ChIP enrichment check ```bash plotFingerprint -b input.bam chip.bam -o fingerprint.png \ --extendReads 200 --ignoreDuplicates ``` ## Effective Genome Sizes | Organism | Assembly | Size | |----------|----------|------| | Human | hg38 | 2913022398 | | Mouse | mm10 | 2652783500 | | Fly | dm6 | 142573017 | ## Common Normalization Methods - **RPGC**: 1× genome coverage (requires --effectiveGenomeSize) - **CPM**: Counts per million (for fixed bins) - **RPKM**: Reads per kb per million (for genes) ## Typical Workflow 1. **QC**: plotFingerprint, plotCorrelation 2. **Coverage**: bamCoverage with normalization 3. **Comparison**: bamCompare for treatment vs control 4. **Visualization**: computeMatrix → plotHeatmap/plotProfile # Effective Genome Sizes ## Definition Effective genome size refers to the length of the "mappable" genome - regions that can be uniquely mapped by sequencing reads. This metric is crucial for proper normalization in many deepTools commands. ## Why It Matters - Required for RPGC normalization (`--normalizeUsing RPGC`) - Affects accuracy of coverage calculations - Must match your data processing approach (filtered vs unfiltered reads) ## Calculation Methods 1. **Non-N bases**: Count of non-N nucleotides in genome sequence 2. **Unique mappability**: Regions of specific size that can be uniquely mapped (may consider edit distance) ## Common Organism Values ### Using Non-N Bases Method | Organism | Assembly | Effective Size | Full Command | |----------|----------|----------------|--------------| | Human | GRCh38/hg38 | 2,913,022,398 | `--effectiveGenomeSize 2913022398` | | Human | GRCh37/hg19 | 2,864,785,220 | `--effectiveGenomeSize 2864785220` | | Mouse | GRCm39/mm39 | 2,654,621,837 | `--effectiveGenomeSize 2654621837` | | Mouse | GRCm38/mm10 | 2,652,783,500 | `--effectiveGenomeSize 2652783500` | | Zebrafish | GRCz11 | 1,368,780,147 | `--effectiveGenomeSize 1368780147` | | *Drosophila* | dm6 | 142,573,017 | `--effectiveGenomeSize 142573017` | | *C. elegans* | WBcel235/ce11 | 100,286,401 | `--effectiveGenomeSize 100286401` | | *C. elegans* | ce10 | 100,258,171 | `--effectiveGenomeSize 100258171` | ### Human (GRCh38) by Read Length For quality-filtered reads, values vary by read length: | Read Length | Effective Size | |-------------|----------------| | 50bp | ~2.7 billion | | 75bp | ~2.8 billion | | 100bp | ~2.8 billion | | 150bp | ~2.9 billion | | 250bp | ~2.9 billion | ### Mouse (GRCm38) by Read Length | Read Length | Effective Size | |-------------|----------------| | 50bp | ~2.3 billion | | 75bp | ~2.5 billion | | 100bp | ~2.6 billion | ## Usage in deepTools The effective genome size is most commonly used with: ### bamCoverage with RPGC normalization ```bash bamCoverage --bam input.bam --outFileName output.bw \ --normalizeUsing RPGC \ --effectiveGenomeSize 2913022398 ``` ### bamCompare with RPGC normalization ```bash bamCompare -b1 treatment.bam -b2 control.bam \ --outFileName comparison.bw \ --scaleFactorsMethod RPGC \ --effectiveGenomeSize 2913022398 ``` ### computeGCBias / correctGCBias ```bash computeGCBias --bamfile input.bam \ --effectiveGenomeSize 2913022398 \ --genome genome.2bit \