
Bioinformatics
Spin up reproducible computational-biology pipelines—QC, clustering, differential expression, and pathway enrichment—for solo founders building biotech, health, or research tooling.
Overview
bioinformatics is an agent skill for the Build phase that produces reproducible computational-biology pipelines for sequence, single-cell, pathway, and network analyses using BioPython, Scanpy, and standard tools.
Install
npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill bioinformaticsWhat is this skill?
- Single-cell RNA-seq workflows with Scanpy (QC, normalization, clustering, DE)
- Sequence analysis and protein structure prediction pipelines
- Gene regulatory network inference and pathway enrichment analysis
- Reproducible, parameterized pipelines following community best practices
- Extends to genomics variant annotation and proteomics domain prediction
- Single-cell experiments can profile tens of thousands of cells
- Thousands of genes measured per cell in scRNA-seq contexts
Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You have sequencing or single-cell data but lack time to wire QC-to-enrichment pipelines that follow field norms and stay reproducible.
Who is it for?
Solo builders and tiny teams shipping bioinformatics features, internal lab tooling, or research agents who already use Python scientific stacks.
Skip if: Generic indie SaaS with no biological datasets, or founders who need clinical diagnostic validation rather than computational pipeline drafts.
When should I use this skill?
You need computational biology workflows for sequence analysis, protein structure, scRNA-seq, GRN inference, or pathway enrichment.
What do I get? / Deliverables
You get documented, parameterized analysis pipelines for transcriptomics, genomics, and related omics workflows ready to run or adapt in your product backend.
- Parameterized analysis pipeline steps
- Documented bioinformatics workflow for the chosen assay type
Recommended Skills
Journey fit
Bioinformatics targets the Build phase because it generates analysis pipelines and computational workflows rather than market discovery or launch distribution. Backend is the canonical shelf: outputs are parameterized pipelines (Scanpy, BioPython) that process sequencing and omics data like server-side scientific jobs.
How it compares
Use instead of generic “analyze my CSV” chat prompts when you need Scanpy-scRNA and pathway-enrichment structure aligned to bioinformatics practice.
Common Questions / FAQ
Who is bioinformatics for?
bioinformatics is for developers and scientist-founders building agents or services around omics data who want standard pipelines for RNA-seq, variants, proteins, and pathway enrichment—not general web app CRUD.
When should I use bioinformatics?
Use it in Build when implementing analysis backends, CLI research tools, or agent workflows that must run QC, clustering, differential expression, or enrichment on real biological datasets.
Is bioinformatics safe to install?
The skill may drive shell execution and large data processing; review the Security Audits panel on this page and sandbox runs when pipelines touch sensitive patient or proprietary sequence data.
SKILL.md
READMESKILL.md - Bioinformatics
# Bioinformatics Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai) ## Description Bioinformatics provides computational biology workflows for sequence analysis, protein structure prediction, single-cell RNA-seq with Scanpy, gene regulatory network inference, and pathway enrichment analysis. The agent generates reproducible analysis pipelines using BioPython, Scanpy, and standard bioinformatics tools, following community best practices for each analysis type. Modern biology generates data faster than biologists can analyze it. A single-cell RNA-seq experiment produces expression profiles for tens of thousands of cells, each with thousands of genes measured. This skill encodes the standard analysis pipelines that transform raw sequencing data into biological insights: quality control, normalization, dimensionality reduction, clustering, differential expression, and pathway enrichment. The skill extends beyond transcriptomics to genomics (variant calling, annotation), proteomics (sequence analysis, domain prediction), and systems biology (gene regulatory networks, protein-protein interactions). Each pipeline is parameterized, documented, and reproducible—the same inputs always produce the same outputs. ## Use When - Analyzing single-cell RNA-seq data with Scanpy - Performing sequence alignment or homology searches - Building gene regulatory network models - Running pathway enrichment analysis (GO, KEGG) - Processing FASTA/FASTQ files with BioPython - Predicting protein structure or function from sequence ## How It Works ```mermaid graph TD A[Raw Sequencing Data] --> B[Quality Control] B --> C[Alignment / Quantification] C --> D{Analysis Type} D -->|Single-Cell| E[Scanpy Pipeline] D -->|Bulk RNA-seq| F[DESeq2 / edgeR] D -->|Genomics| G[Variant Calling] E --> H[Normalize → PCA → UMAP → Cluster] H --> I[Differential Expression] I --> J[Pathway Enrichment] F --> I G --> K[Annotation + Impact Prediction] J --> L[Biological Interpretation] K --> L ``` The pipeline branches based on data type. Single-cell data follows the Scanpy standard workflow; bulk RNA-seq uses DESeq2 or edgeR; genomic data goes through variant calling and annotation. All paths converge on biological interpretation. ## Implementation ```python import scanpy as sc import numpy as np def scrna_pipeline(adata_path: str, min_genes: int = 200, min_cells: int = 3) -> sc.AnnData: adata = sc.read_h5ad(adata_path) sc.pp.filter_cells(adata, min_genes=min_genes) sc.pp.filter_genes(adata, min_cells=min_cells) adata.var["mt"] = adata.var_names.str.startswith("MT-") sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True) adata = adata[adata.obs.pct_counts_mt < 20, :].copy() sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, n_top_genes=2000, flavor="seurat_v3") adata.raw = adata adata = adata[:, adata.var.highly_variable].copy() sc.pp.scale(adata, max_value=10) sc.tl.pca(adata, n_comps=50) sc.pp.neighbors(adata, n_pcs=30) sc.tl.umap(adata) sc.tl.leiden(adata, resolution=0.5) sc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon") return adata def pathway_enrichment(gene_list: list[str], organism: str = "hsapiens") -> pd.DataFrame: from gprofiler import GProfiler gp = GProfiler(return_dataframe=True) results = gp.profile( organism=organism, query=gene_list, sources=["GO:BP", "GO:MF", "KEGG", "REAC"], ) return results[results["significant"]].sort_values("p_value") ``` ```python from Bio import Seq