
Tooluniverse Sequence Retrieval
Produce complete, citation-ready gene and sequence profiles via ToolUniverse with correct NCBI versus ENA tool routing and curation tiers.
Install
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-sequence-retrievalWhat is this skill?
- Organism, gene, sequence-type, and strain disambiguation before tool selection
- RefSeq versus GenBank accession prefixes with ENA compatibility rules
- Per-sequence required fields including curation level tiers (●●●● through ○○○○)
- FASTA and GenBank download commands plus BioProject/BioSample cross-links
- Error handling for empty results and ENA 404 on incompatible accessions
Adoption & trust: 1.5k installs on skills.sh; 1.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Sequence retrieval is upstream discovery work—confirming organism, accession type, and database source before any bioinformatics build or analysis pipeline. Research fits checklist-driven disambiguation, cross-database accession rules, and structured report quality for scientific lookup tasks.
Common Questions / FAQ
Is Tooluniverse Sequence Retrieval safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Tooluniverse Sequence Retrieval
# Sequence Retrieval Checklist Use this checklist to ensure complete sequence profiles. ## Disambiguation - [ ] Organism confirmed (scientific name) - [ ] Gene symbol/name identified - [ ] Sequence type determined (genomic/mRNA/protein) - [ ] Strain specified (if relevant) - [ ] Accession prefix identified → tool selection ## Accession Type Handling - [ ] RefSeq (NC_, NM_, NP_, XM_) → NCBI tools only - [ ] GenBank (U*, M*, CP*, etc.) → NCBI or ENA - [ ] ENA tools NOT used with RefSeq accessions ## Per Sequence (Required) - [ ] Accession number - [ ] Organism (scientific name) - [ ] Sequence type (DNA/RNA/protein) - [ ] Length - [ ] Curation level (●●●●/●●●○/●●○○/●○○○/○○○○) - [ ] Database source ## Sequence Details - [ ] Definition/title - [ ] Molecule type (DNA/mRNA/protein) - [ ] Topology (linear/circular) - [ ] Sequence preview (first 100-200 bp) ## Annotations (If GenBank Format) - [ ] CDS count and examples - [ ] Gene count - [ ] Other features noted ## Cross-References - [ ] RefSeq accession (if exists) - [ ] GenBank accession - [ ] ENA compatibility noted - [ ] BioProject/BioSample links ## Download Options - [ ] FASTA format command shown - [ ] GenBank format command shown - [ ] Direct database links provided ## Report Quality - [ ] Search process NOT shown in output - [ ] Curation level tiers applied - [ ] Alternative sequences listed - [ ] Retrieval date included ## Error Handling - [ ] No results → broader search suggested - [ ] ENA 404 → recognized as RefSeq, NCBI used - [ ] Large sequences → download link instead of preview # Sequence Retrieval Examples ## Example 1: Find E. coli K-12 Genome ```python from tooluniverse import ToolUniverse tu = ToolUniverse() tu.load_tools() # Search result = tu.tools.NCBI_search_nucleotide( operation="search", organism="Escherichia coli", strain="K-12", seq_type="complete_genome", limit=3 ) # Get accessions accessions = tu.tools.NCBI_fetch_accessions( operation="fetch_accession", uids=result["data"]["uids"] ) # Get sequence (RefSeq reference) sequence = tu.tools.NCBI_get_sequence( operation="fetch_sequence", accession="NC_000913.3", format="fasta" ) print(f"Genome size: {len(sequence['data'])} characters") ``` ## Example 2: Get Human BRCA1 Gene ```python # Search for BRCA1 result = tu.tools.NCBI_search_nucleotide( operation="search", organism="Homo sapiens", gene="BRCA1", limit=5 ) print(f"Found {result['data']['count']} BRCA1 sequences") # Get top accessions accessions = tu.tools.NCBI_fetch_accessions( operation="fetch_accession", uids=result["data"]["uids"] ) # Get mRNA sequence with annotations genbank = tu.tools.NCBI_get_sequence( operation="fetch_sequence", accession=accessions["data"][0], format="genbank" ) ``` ## Example 3: SARS-CoV-2 Reference Genome ```python # Search for reference genome result = tu.tools.NCBI_search_nucleotide( operation="search", organism="SARS-CoV-2", keywords="reference genome Wuhan", limit=1 ) # Get accession (NC_045512) accessions = tu.tools.NCBI_fetch_accessions( operation="fetch_accession", uids=result["data"]["uids"] ) # Download complete genome genome = tu.tools.NCBI_get_sequence( operation="fetch_sequence", accession="NC_045512.2", format="fasta" ) print(genome["data"][:200]) # Preview ``` ## Example 4: Compare RefSeq vs GenBank ```python # Search returns both types result = tu.tools.NCBI_search_nucleotide( operation="search", organism="Escherichia coli", strain="K-12", limit=5 ) accessions = tu.tools.NCBI_fetch_accessions( operation="fetch_accession", uids=result["data"]["uids"] ) # Categorize refseq = [a for a in accessions["data"] if a.startswith("NC_")] genbank = [a for a in accessions["data"] if not a.startswith("NC_")] print(f"RefSeq (NCBI only): {refseq}") print(f"GenBank (ENA compatible): {genbank}") ``` ## Example 5: Multi-Format Retrieval ```python accessio