
Gget
Let your agent query genomics and protein databases (Ensembl, UniProt, UCSC BLAT) through the gget Python toolkit with correct release and API context.
Overview
gget is an agent skill for the Build phase that documents how the gget Python toolkit integrates with Ensembl, UCSC, and UniProt for genomics and protein queries.
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill ggetWhat is this skill?
- Maps gget modules to Ensembl, UCSC Genome Browser, and UniProt sources
- Documents release pinning for Ensembl reproducibility and species shortcuts
- Covers BLAT via UCSC with assembly notes (e.g., hg38, mm39)
- Warns that upstream DB schemas change and recommends `pip install --upgrade gget`
- Notes biweekly automated tests against live database structures
- Biweekly automated gget module tests against live databases
- Ensembl releases approximately every 3 months
Adoption & trust: 528 installs on skills.sh; 27.6k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your agent writes gget calls without knowing which database, release, or assembly each module uses—and breaks when schemas change.
Who is it for?
Indie bioinformatics or health-tech builders wiring agents to public genomics and protein databases via gget.
Skip if: General web SaaS builders with no DNA/protein data needs, or teams that forbid pip installs pulling live DB APIs.
When should I use this skill?
Implementing or debugging gget database modules (ref, search, info, seq, blat) in a Python bioinformatics workflow.
What do I get? / Deliverables
Your agent selects the right gget module, release, and upstream source with reproducible parameters for sequences, search, info, ref, seq, and blat workflows.
- Correct gget module and release parameters for a query
- Reproducible fetch of reference, sequence, or alignment data
Recommended Skills
Journey fit
Canonical shelf is Build → integrations because gget is a database-facing toolkit your agent invokes during scientific feature work, not a go-to-market or ops monitor. The skill documents external DB modules (Ensembl, UCSC, UniProt)—classic third-party integration knowledge for bioinformatics builds.
How it compares
Reference integration skill for the gget CLI/API layer—not a generic RAG or NotebookLM research pack.
Common Questions / FAQ
Who is gget for?
Solo developers and small lab-adjacent teams building Python pipelines or agent tools that query Ensembl, UniProt, or UCSC through gget.
When should I use gget?
During build integrations while implementing or debugging gget ref, search, info, seq, or blat calls against live scientific databases.
Is gget safe to install?
gget pulls from public scientific APIs; review the Security Audits panel on this Prism page and pin releases in your own environment before production use.
SKILL.md
READMESKILL.md - Gget
# gget Database Information Overview of databases queried by gget modules, including update frequencies and important considerations. ## Important Note The databases queried by gget are continuously being updated, which sometimes changes their structure. gget modules are tested automatically on a biweekly basis and updated to match new database structures when necessary. Always keep gget updated: ```bash pip install --upgrade gget ``` ## Database Directory ### Genomic Reference Databases #### Ensembl - **Used by:** gget ref, gget search, gget info, gget seq - **Description:** Comprehensive genome database with annotations for vertebrate and invertebrate species - **Update frequency:** Regular releases (numbered); new releases approximately every 3 months - **Access:** FTP downloads, REST API - **Website:** https://www.ensembl.org/ - **Notes:** - Supports both vertebrate and invertebrate genomes - Can specify release number for reproducibility - Shortcuts available for common species ('human', 'mouse') #### UCSC Genome Browser - **Used by:** gget blat - **Description:** Genome browser database with BLAT alignment tool - **Update frequency:** Regular updates with new assemblies - **Access:** Web service API - **Website:** https://genome.ucsc.edu/ - **Notes:** - Multiple genome assemblies available (hg38, mm39, etc.) - BLAT optimized for vertebrate genomes ### Protein & Structure Databases #### UniProt - **Used by:** gget info, gget seq (amino acid sequences), gget elm - **Description:** Universal Protein Resource, comprehensive protein sequence and functional information - **Update frequency:** Regular releases (weekly for Swiss-Prot, monthly for TrEMBL) - **Access:** REST API - **Website:** https://www.uniprot.org/ - **Notes:** - Swiss-Prot: manually annotated and reviewed - TrEMBL: automatically annotated #### NCBI (National Center for Biotechnology Information) - **Used by:** gget info, gget bgee (for non-Ensembl species) - **Description:** Gene and protein databases with extensive cross-references - **Update frequency:** Continuous updates - **Access:** E-utilities API - **Website:** https://www.ncbi.nlm.nih.gov/ - **Databases:** Gene, Protein, RefSeq #### RCSB PDB (Protein Data Bank) - **Used by:** gget pdb - **Description:** Repository of 3D structural data for proteins and nucleic acids - **Update frequency:** Weekly updates - **Access:** REST API - **Website:** https://www.rcsb.org/ - **Notes:** - Experimentally determined structures (X-ray, NMR, cryo-EM) - Includes metadata about experiments and publications #### ELM (Eukaryotic Linear Motif) - **Used by:** gget elm - **Description:** Database of functional sites in eukaryotic proteins - **Update frequency:** Periodic updates - **Access:** Downloaded database (via gget setup elm) - **Website:** http://elm.eu.org/ - **Notes:** - Requires local download before first use - Contains validated motifs and patterns ### Sequence Similarity Databases #### BLAST Databases (NCBI) - **Used by:** gget blast - **Description:** Pre-formatted databases for BLAST searches - **Update frequency:** Regular updates - **Access:** NCBI BLAST API - **Databases:** - **Nucleotide:** nt (all GenBank), refseq_rna, pdbnt - **Protein:** nr (non-redundant), swissprot, pdbaa, refseq_protein - **Notes:** - nt and nr are very large databases - Consider specialized databases for faster, more focused searches ### Expression & Correlation Databases #### ARCHS4 - **Used by:** gget archs4 - **Description:** Massive mining of publicly available RNA-seq data - **Update frequency:** Periodic updates with new samples - **Access:** HTTP API - **Website:** https://maayanlab.cloud/archs4/ - **Data:** - Human and mouse RNA-seq data - Correlation matrices - Tissue expression atlases - **Citation:** Lachmann et al., Nature Communications, 2018 #### CZ CELLxGENE Discover - **Used by:** gget cellxgene - **Description:** Single-cell RNA-seq data from multiple studies