
Biopython
Implement sequence motifs, PWM/PSSM search, and Biopython parsing patterns in bioinformatics or computational biology build work.
Overview
Biopython is an agent skill for the Build phase that teaches advanced motif, PWM/PSSM, and file-parsing patterns with the Biopython library.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill biopythonWhat is this skill?
- Bio.motifs workflows: create motifs from instances, consensus, and degenerate IUPAC consensus
- Position Weight Matrix normalization with pseudocounts and information content
- PSSM log-odds search with score thresholds over test sequences
- Read and parse motifs from JASPAR and similar file formats
- Copy-paste advanced Biopython patterns for motif discovery pipelines
Adoption & trust: 539 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need to search sequences for motifs or load JASPAR data but do not have reliable Biopython examples for PWMs, PSSMs, and motif IO.
Who is it for?
Indie developers and researchers building sequence analysis scripts, pipelines, or small bioinformatics services with Biopython.
Skip if: Teams that only need high-level BLAST tutorials with no motif or matrix implementation detail.
When should I use this skill?
Implementing motif creation, PWM/PSSM search, or JASPAR motif IO with Biopython in a build task.
What do I get? / Deliverables
Your agent produces working Python that creates motifs, normalizes weight matrices, scores sequences, and reads standard motif files.
- Python modules or snippets for motif matrices and sequence search
- Motif parsing code for standard alignment/motif file formats
Recommended Skills
Journey fit
Canonical shelf is Build backend because the skill teaches procedural Biopython APIs for motifs, matrices, and sequence search—implementation knowledge for pipelines and analysis code. backend fits Python modules, data processing, and motif/sequence algorithms rather than UI or launch work.
How it compares
Skill package for Biopython implementation recipes—not an MCP server or wet-lab protocol assistant.
Common Questions / FAQ
Who is biopython for?
Solo builders and scientists using coding agents to implement motif discovery, PWM/PSSM search, and motif file parsing in Python.
When should I use biopython?
Use it during Build when writing backend analysis code, CLI tools, or data pipelines that depend on Bio.motifs and sequence scoring.
Is biopython safe to install?
Review the Security Audits panel on this Prism page and treat the skill as documentation-level guidance; validate generated code and dependencies in your own environment.
SKILL.md
READMESKILL.md - Biopython
# Advanced Biopython Features ## Sequence Motifs with Bio.motifs ### Creating Motifs ```python from Bio import motifs from Bio.Seq import Seq # Create motif from instances instances = [ Seq("TACAA"), Seq("TACGC"), Seq("TACAC"), Seq("TACCC"), Seq("AACCC"), Seq("AATGC"), Seq("AATGC"), ] motif = motifs.create(instances) ``` ### Motif Consensus and Degenerate Sequences ```python # Get consensus sequence print(motif.counts.consensus) # Get degenerate consensus (IUPAC ambiguity codes) print(motif.counts.degenerate_consensus) # Access counts matrix print(motif.counts) ``` ### Position Weight Matrix (PWM) ```python # Create position weight matrix pwm = motif.counts.normalize(pseudocounts=0.5) print(pwm) # Calculate information content ic = motif.counts.information_content() print(f"Information content: {ic:.2f} bits") ``` ### Searching for Motifs ```python from Bio.Seq import Seq # Search sequence for motif test_seq = Seq("ATACAGGACAGACATACGCATACAACATTACAC") # Get Position Specific Scoring Matrix (PSSM) pssm = pwm.log_odds() # Search sequence for position, score in pssm.search(test_seq, threshold=5.0): print(f"Position {position}: score = {score:.2f}") ``` ### Reading Motifs from Files ```python # Read motif from JASPAR format with open("motif.jaspar") as handle: motif = motifs.read(handle, "jaspar") # Read multiple motifs with open("motifs.jaspar") as handle: for m in motifs.parse(handle, "jaspar"): print(m.name) # Supported formats: jaspar, meme, transfac, pfm ``` ### Writing Motifs ```python # Write motif in JASPAR format with open("output.jaspar", "w") as handle: handle.write(motif.format("jaspar")) ``` ## Population Genetics with Bio.PopGen ### Working with GenePop Files ```python from Bio.PopGen import GenePop # Read GenePop file with open("data.gen") as handle: record = GenePop.read(handle) # Access populations print(f"Number of populations: {len(record.populations)}") print(f"Loci: {record.loci_list}") # Iterate through populations for pop_idx, pop in enumerate(record.populations): print(f"\nPopulation {pop_idx + 1}:") for individual in pop: print(f" {individual[0]}: {individual[1]}") ``` ### Calculating Population Statistics ```python from Bio.PopGen.GenePop.Controller import GenePopController # Create controller ctrl = GenePopController() # Calculate basic statistics result = ctrl.calc_allele_genotype_freqs("data.gen") # Calculate Fst fst_result = ctrl.calc_fst_all("data.gen") print(f"Fst: {fst_result}") # Test Hardy-Weinberg equilibrium hw_result = ctrl.test_hw_pop("data.gen", "probability") ``` ## Sequence Utilities with Bio.SeqUtils ### GC Content ```python from Bio.SeqUtils import gc_fraction from Bio.Seq import Seq seq = Seq("ATCGATCGATCG") gc = gc_fraction(seq) print(f"GC content: {gc:.2%}") ``` ### Molecular Weight ```python from Bio.SeqUtils import molecular_weight # DNA molecular weight dna_seq = Seq("ATCG") mw = molecular_weight(dna_seq, seq_type="DNA") print(f"DNA MW: {mw:.2f} g/mol") # Protein molecular weight protein_seq = Seq("ACDEFGHIKLMNPQRSTVWY") mw = molecular_weight(protein_seq, seq_type="protein") print(f"Protein MW: {mw:.2f} Da") ``` ### Melting Temperature ```python from Bio.SeqUtils import MeltingTemp as mt # Calculate Tm using nearest-neighbor method seq = Seq("ATCGATCGATCG") tm = mt.Tm_NN(seq) print(f"Tm: {tm:.1f}°C") # Use different salt concentration tm = mt.Tm_NN(seq, Na=50, Mg=1.5) # 50 mM Na+, 1.5 mM Mg2+ # Wallace rule (for primers) tm_wallace = mt.Tm_Wallace(seq) ``` ### GC Skew ```python from Bio.SeqUtils import gc_skew # Calculate GC skew seq = Seq("ATCGATCGGGCCCAAATTT") skew = gc_skew(seq, window=100) print(f"GC skew: {skew}") ``` ### ProtParam - Protein Analysis ```python from Bio.SeqUtils.ProtParam import ProteinAnalysis protein_seq = "ACDEFGHIKLMNPQRSTVWY" analyzed_seq = ProteinAnalysis(protein_seq) # Molecular weight print(f"MW: {analyzed_seq.molecular_weight(