Bioinformatics

Name: Bioinformatics
Author: itallstartedwithaidea

itallstartedwithaidea/agent-skills

Spin up reproducible computational-biology pipelines—QC, clustering, differential expression, and pathway enrichment—for solo founders building biotech, health, or research tooling.

Overview

bioinformatics is an agent skill for the Build phase that produces reproducible computational-biology pipelines for sequence, single-cell, pathway, and network analyses using BioPython, Scanpy, and standard tools.

Install

npx skills add https://github.com/itallstartedwithaidea/agent-skills --skill bioinformatics

What is this skill?

Single-cell RNA-seq workflows with Scanpy (QC, normalization, clustering, DE)
Sequence analysis and protein structure prediction pipelines
Gene regulatory network inference and pathway enrichment analysis
Reproducible, parameterized pipelines following community best practices
Extends to genomics variant annotation and proteomics domain prediction
Single-cell experiments can profile tens of thousands of cells
Thousands of genes measured per cell in scRNA-seq contexts

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You have sequencing or single-cell data but lack time to wire QC-to-enrichment pipelines that follow field norms and stay reproducible.

Who is it for?

Solo builders and tiny teams shipping bioinformatics features, internal lab tooling, or research agents who already use Python scientific stacks.

Skip if: Generic indie SaaS with no biological datasets, or founders who need clinical diagnostic validation rather than computational pipeline drafts.

When should I use this skill?

You need computational biology workflows for sequence analysis, protein structure, scRNA-seq, GRN inference, or pathway enrichment.

What do I get? / Deliverables

You get documented, parameterized analysis pipelines for transcriptomics, genomics, and related omics workflows ready to run or adapt in your product backend.

Parameterized analysis pipeline steps
Documented bioinformatics workflow for the chosen assay type

Recommended Skills

Paper Context Resolverlllllllama/ai-paper-reproduction-skill

Optional helper-tier skill that supplements README-guided deep learning reproduction by resolving specific paper details…140k installs·412 stars

Repo Intake And Planlllllllama/ai-paper-reproduction-skill

Rigor Intake scans repository docs and layout to classify documented commands and propose a minimal reproduction plan fo…140k installs·412 stars

Env And Assets Bootstraplllllllama/ai-paper-reproduction-skill

Rigor Setup establishes conservative environment and asset assumptions aligned with README and config evidence before ex…140k installs·412 stars

Minimal Run And Auditlllllllama/ai-paper-reproduction-skill

RigorPilot executes the selected minimal reproduction command and produces normalized, auditable run evidence for paper …140k installs·412 stars

Analyze Projectlllllllama/rigorpilot-skills

analyze-project is a read-only agent skill from the RigorPilot family aimed at solo builders and small teams inheriting …32.3k installs·412 stars

Ai Research Reproductionlllllllama/rigorpilot-skills

ai-research-reproduction is the RigorPilot Reproduce orchestrator for solo builders and small teams who need to rerun a …32.3k installs·412 stars

Journey fit

Primary fit

BuildBackend, data & payments

Bioinformatics targets the Build phase because it generates analysis pipelines and computational workflows rather than market discovery or launch distribution. Backend is the canonical shelf: outputs are parameterized pipelines (Scanpy, BioPython) that process sequencing and omics data like server-side scientific jobs.

Also useful

IdeaOpportunity & market research

How it compares

Use instead of generic “analyze my CSV” chat prompts when you need Scanpy-scRNA and pathway-enrichment structure aligned to bioinformatics practice.

Common Questions / FAQ

Who is bioinformatics for?

bioinformatics is for developers and scientist-founders building agents or services around omics data who want standard pipelines for RNA-seq, variants, proteins, and pathway enrichment—not general web app CRUD.

When should I use bioinformatics?

Use it in Build when implementing analysis backends, CLI research tools, or agent workflows that must run QC, clustering, differential expression, or enrichment on real biological datasets.

Is bioinformatics safe to install?

The skill may drive shell execution and large data processing; review the Security Audits panel on this page and sandbox runs when pipelines touch sensitive patient or proprietary sequence data.

SKILL.md

READMESKILL.md - Bioinformatics

# Bioinformatics

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Bioinformatics provides computational biology workflows for sequence analysis, protein structure prediction, single-cell RNA-seq with Scanpy, gene regulatory network inference, and pathway enrichment analysis. The agent generates reproducible analysis pipelines using BioPython, Scanpy, and standard bioinformatics tools, following community best practices for each analysis type.

Modern biology generates data faster than biologists can analyze it. A single-cell RNA-seq experiment produces expression profiles for tens of thousands of cells, each with thousands of genes measured. This skill encodes the standard analysis pipelines that transform raw sequencing data into biological insights: quality control, normalization, dimensionality reduction, clustering, differential expression, and pathway enrichment.

The skill extends beyond transcriptomics to genomics (variant calling, annotation), proteomics (sequence analysis, domain prediction), and systems biology (gene regulatory networks, protein-protein interactions). Each pipeline is parameterized, documented, and reproducible—the same inputs always produce the same outputs.

## Use When

- Analyzing single-cell RNA-seq data with Scanpy
- Performing sequence alignment or homology searches
- Building gene regulatory network models
- Running pathway enrichment analysis (GO, KEGG)
- Processing FASTA/FASTQ files with BioPython
- Predicting protein structure or function from sequence

## How It Works

```mermaid
graph TD
    A[Raw Sequencing Data] --> B[Quality Control]
    B --> C[Alignment / Quantification]
    C --> D{Analysis Type}
    D -->|Single-Cell| E[Scanpy Pipeline]
    D -->|Bulk RNA-seq| F[DESeq2 / edgeR]
    D -->|Genomics| G[Variant Calling]
    E --> H[Normalize → PCA → UMAP → Cluster]
    H --> I[Differential Expression]
    I --> J[Pathway Enrichment]
    F --> I
    G --> K[Annotation + Impact Prediction]
    J --> L[Biological Interpretation]
    K --> L
```

The pipeline branches based on data type. Single-cell data follows the Scanpy standard workflow; bulk RNA-seq uses DESeq2 or edgeR; genomic data goes through variant calling and annotation. All paths converge on biological interpretation.

## Implementation

```python
import scanpy as sc
import numpy as np

def scrna_pipeline(adata_path: str, min_genes: int = 200, min_cells: int = 3) -> sc.AnnData:
    adata = sc.read_h5ad(adata_path)

    sc.pp.filter_cells(adata, min_genes=min_genes)
    sc.pp.filter_genes(adata, min_cells=min_cells)

    adata.var["mt"] = adata.var_names.str.startswith("MT-")
    sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
    adata = adata[adata.obs.pct_counts_mt < 20, :].copy()

    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)

    sc.pp.highly_variable_genes(adata, n_top_genes=2000, flavor="seurat_v3")
    adata.raw = adata
    adata = adata[:, adata.var.highly_variable].copy()

    sc.pp.scale(adata, max_value=10)
    sc.tl.pca(adata, n_comps=50)
    sc.pp.neighbors(adata, n_pcs=30)
    sc.tl.umap(adata)
    sc.tl.leiden(adata, resolution=0.5)

    sc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon")

    return adata

def pathway_enrichment(gene_list: list[str], organism: str = "hsapiens") -> pd.DataFrame:
    from gprofiler import GProfiler
    gp = GProfiler(return_dataframe=True)
    results = gp.profile(
        organism=organism,
        query=gene_list,
        sources=["GO:BP", "GO:MF", "KEGG", "REAC"],
    )
    return results[results["significant"]].sort_values("p_value")
```

```python
from Bio import Seq

What is this skill?

Single-cell RNA-seq workflows with Scanpy (QC, normalization, clustering, DE)

Sequence analysis and protein structure prediction pipelines

Gene regulatory network inference and pathway enrichment analysis

Reproducible, parameterized pipelines following community best practices

Extends to genomics variant annotation and proteomics domain prediction

Single-cell experiments can profile tens of thousands of cells

Thousands of genes measured per cell in scRNA-seq contexts

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 1 installs on skills.sh; 18 GitHub stars; 2/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

BuildBackend, data & payments

Also useful

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Bioinformatics

# Bioinformatics

Part of [Agent Skills™](https://github.com/itallstartedwithaidea/agent-skills) by [googleadsagent.ai™](https://googleadsagent.ai)

## Description

Bioinformatics provides computational biology workflows for sequence analysis, protein structure prediction, single-cell RNA-seq with Scanpy, gene regulatory network inference, and pathway enrichment analysis. The agent generates reproducible analysis pipelines using BioPython, Scanpy, and standard bioinformatics tools, following community best practices for each analysis type.

Modern biology generates data faster than biologists can analyze it. A single-cell RNA-seq experiment produces expression profiles for tens of thousands of cells, each with thousands of genes measured. This skill encodes the standard analysis pipelines that transform raw sequencing data into biological insights: quality control, normalization, dimensionality reduction, clustering, differential expression, and pathway enrichment.

The skill extends beyond transcriptomics to genomics (variant calling, annotation), proteomics (sequence analysis, domain prediction), and systems biology (gene regulatory networks, protein-protein interactions). Each pipeline is parameterized, documented, and reproducible—the same inputs always produce the same outputs.

## Use When

- Analyzing single-cell RNA-seq data with Scanpy
- Performing sequence alignment or homology searches
- Building gene regulatory network models
- Running pathway enrichment analysis (GO, KEGG)
- Processing FASTA/FASTQ files with BioPython
- Predicting protein structure or function from sequence

## How It Works

```mermaid
graph TD
    A[Raw Sequencing Data] --> B[Quality Control]
    B --> C[Alignment / Quantification]
    C --> D{Analysis Type}
    D -->|Single-Cell| E[Scanpy Pipeline]
    D -->|Bulk RNA-seq| F[DESeq2 / edgeR]
    D -->|Genomics| G[Variant Calling]
    E --> H[Normalize → PCA → UMAP → Cluster]
    H --> I[Differential Expression]
    I --> J[Pathway Enrichment]
    F --> I
    G --> K[Annotation + Impact Prediction]
    J --> L[Biological Interpretation]
    K --> L
```

The pipeline branches based on data type. Single-cell data follows the Scanpy standard workflow; bulk RNA-seq uses DESeq2 or edgeR; genomic data goes through variant calling and annotation. All paths converge on biological interpretation.

## Implementation

```python
import scanpy as sc
import numpy as np

def scrna_pipeline(adata_path: str, min_genes: int = 200, min_cells: int = 3) -> sc.AnnData:
    adata = sc.read_h5ad(adata_path)

    sc.pp.filter_cells(adata, min_genes=min_genes)
    sc.pp.filter_genes(adata, min_cells=min_cells)

    adata.var["mt"] = adata.var_names.str.startswith("MT-")
    sc.pp.calculate_qc_metrics(adata, qc_vars=["mt"], inplace=True)
    adata = adata[adata.obs.pct_counts_mt < 20, :].copy()

    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)

    sc.pp.highly_variable_genes(adata, n_top_genes=2000, flavor="seurat_v3")
    adata.raw = adata
    adata = adata[:, adata.var.highly_variable].copy()

    sc.pp.scale(adata, max_value=10)
    sc.tl.pca(adata, n_comps=50)
    sc.pp.neighbors(adata, n_pcs=30)
    sc.tl.umap(adata)
    sc.tl.leiden(adata, resolution=0.5)

    sc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon")

    return adata

def pathway_enrichment(gene_list: list[str], organism: str = "hsapiens") -> pd.DataFrame:
    from gprofiler import GProfiler
    gp = GProfiler(return_dataframe=True)
    results = gp.profile(
        organism=organism,
        query=gene_list,
        sources=["GO:BP", "GO:MF", "KEGG", "REAC"],
    )
    return results[results["significant"]].sort_values("p_value")
```

```python
from Bio import Seq

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is bioinformatics for?

When should I use bioinformatics?

Is bioinformatics safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is bioinformatics for?

When should I use bioinformatics?

Is bioinformatics safe to install?

SKILL.md