
Pysam
Give your agent correct pysam patterns for SAM/BAM/CRAM, VCF/BCF, and FASTA/FASTQ in Python NGS pipelines.
Overview
Pysam is an agent skill for the Build phase that teaches Python patterns for reading and writing SAM/BAM/CRAM, VCF/BCF, and FASTA/FASTQ in NGS pipelines.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill pysamWhat is this skill?
- Read/write SAM, BAM, and CRAM alignments with region fetch and pileup-friendly access patterns
- Iterate and manipulate VCF/BCF variant records for calling and annotation workflows
- Handle FASTA/FASTQ reference and raw read sequences in pipeline glue code
- Query tabix-indexed files and calculate coverage or read depth from alignments
- Documents uv pip install pysam and Pythonic htslib usage for QC and NGS processing
Adoption & trust: 514 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are writing a genomics pipeline in Python and need reliable pysam examples for alignments, variants, indexed regions, and coverage without misusing htslib APIs.
Who is it for?
Solo builders implementing NGS data processing, variant analysis, or sequencing QC scripts that stay in Python against standard bioinformatics file types.
Skip if: Pure statistical modeling with no genomic files, or teams that only need one-off samtools CLI commands without maintained Python code.
When should I use this skill?
Working with sequencing alignment files (BAM/CRAM), genetic variants (VCF/BCF), reference sequences, FASTQ processing, coverage, or bioinformatics pipelines
What do I get? / Deliverables
Your agent produces pipeline-ready Python that opens the right genomic formats, queries regions, and supports QC, variant, and coverage workflows with pysam.
- Python modules or scripts that read/write alignments and variants with pysam
- Region fetch, pileup, or tabix query logic for pipeline steps
- QC or coverage helper code wired into NGS workflows
Recommended Skills
Journey fit
Build is the primary phase where sequencing pipelines, variant workflows, and coverage scripts are implemented against htslib via pysam. Integrations reflects binding to genomic file formats, tabix indexes, and samtools/bcftools-style operations inside Python jobs.
How it compares
Skill package for Pythonic htslib access—not a hosted MCP server or a variant caller replacement.
Common Questions / FAQ
Who is pysam for?
Developers and computational biologists building Python pipelines over BAM/CRAM alignments, VCF variants, and FASTA/FASTQ inputs.
When should I use pysam?
Use it during Build/Integrations when implementing alignment readers, variant iterators, tabix region queries, coverage pileups, or FASTQ processing in code.
Is pysam safe to install?
The skill documents library usage and install commands; review the Security Audits panel on this page and pin pysam versions in your environment like any native dependency.
SKILL.md
READMESKILL.md - Pysam
# Pysam ## Overview Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands. ## When to Use This Skill This skill should be used when: - Working with sequencing alignment files (BAM/CRAM) - Analyzing genetic variants (VCF/BCF) - Extracting reference sequences or gene regions - Processing raw sequencing data (FASTQ) - Calculating coverage or read depth - Implementing bioinformatics analysis pipelines - Quality control of sequencing data - Variant calling and annotation workflows ## Quick Start ### Installation ```bash uv pip install pysam ``` ### Basic Examples **Read alignment file:** ```python import pysam # Open BAM file and fetch reads in region samfile = pysam.AlignmentFile("example.bam", "rb") for read in samfile.fetch("chr1", 1000, 2000): print(f"{read.query_name}: {read.reference_start}") samfile.close() ``` **Read variant file:** ```python # Open VCF file and iterate variants vcf = pysam.VariantFile("variants.vcf") for variant in vcf: print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}") vcf.close() ``` **Query reference sequence:** ```python # Open FASTA and extract sequence fasta = pysam.FastaFile("reference.fasta") sequence = fasta.fetch("chr1", 1000, 2000) print(sequence) fasta.close() ``` ## Core Capabilities ### 1. Alignment File Operations (SAM/BAM/CRAM) Use the `AlignmentFile` class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control. **Common operations:** - Open and read BAM/SAM/CRAM files - Fetch reads from specific genomic regions - Filter reads by mapping quality, flags, or other criteria - Write filtered or modified alignments - Calculate coverage statistics - Perform pileup analysis (base-by-base coverage) - Access read sequences, quality scores, and alignment information **Reference:** See `references/alignment_files.md` for detailed documentation on: - Opening and reading alignment files - AlignedSegment attributes and methods - Region-based fetching with `fetch()` - Pileup analysis for coverage - Writing and creating BAM files - Coordinate systems and indexing - Performance optimization tips ### 2. Variant File Operations (VCF/BCF) Use the `VariantFile` class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics. **Common operations:** - Read and write VCF/BCF files - Query variants in specific regions - Access variant information (position, alleles, quality) - Extract genotype data for samples - Filter variants by quality, allele frequency, or other criteria - Annotate variants with additional information - Subset samples or regions **Reference:** See `references/variant_files.md` for detailed documentation on: - Opening and reading variant files - VariantRecord attributes and methods - Accessing INFO and FORMAT fields - Working with genotypes and samples - Creating and writing VCF files - Filtering and subsetting variants - Multi-sample VCF operations ### 3. Sequence File Operations (FASTA/FASTQ) Use `FastaFile` for random access to reference sequences and `FastxFile` for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads. **Common operations:** - Query reference sequences by genomic coordinates - Extract sequences for genes or regions of interest - Read FASTQ