Sequence retrieval is upstream discovery work—confirming organism, accession type, and database source before any bioinformatics build or analysis pipeline. Research fits checklist-driven disambiguation, cross-database accession rules, and structured report quality for scientific lookup tasks.

Common Questions / FAQ

Is Tooluniverse Sequence Retrieval safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Tooluniverse Sequence Retrieval

# Sequence Retrieval Checklist

Use this checklist to ensure complete sequence profiles.

## Disambiguation

- [ ] Organism confirmed (scientific name)
- [ ] Gene symbol/name identified
- [ ] Sequence type determined (genomic/mRNA/protein)
- [ ] Strain specified (if relevant)
- [ ] Accession prefix identified → tool selection

## Accession Type Handling

- [ ] RefSeq (NC_, NM_, NP_, XM_) → NCBI tools only
- [ ] GenBank (U*, M*, CP*, etc.) → NCBI or ENA
- [ ] ENA tools NOT used with RefSeq accessions

## Per Sequence (Required)

- [ ] Accession number
- [ ] Organism (scientific name)
- [ ] Sequence type (DNA/RNA/protein)
- [ ] Length
- [ ] Curation level (●●●●/●●●○/●●○○/●○○○/○○○○)
- [ ] Database source

## Sequence Details

- [ ] Definition/title
- [ ] Molecule type (DNA/mRNA/protein)
- [ ] Topology (linear/circular)
- [ ] Sequence preview (first 100-200 bp)

## Annotations (If GenBank Format)

- [ ] CDS count and examples
- [ ] Gene count
- [ ] Other features noted

## Cross-References

- [ ] RefSeq accession (if exists)
- [ ] GenBank accession
- [ ] ENA compatibility noted
- [ ] BioProject/BioSample links

## Download Options

- [ ] FASTA format command shown
- [ ] GenBank format command shown
- [ ] Direct database links provided

## Report Quality

- [ ] Search process NOT shown in output
- [ ] Curation level tiers applied
- [ ] Alternative sequences listed
- [ ] Retrieval date included

## Error Handling

- [ ] No results → broader search suggested
- [ ] ENA 404 → recognized as RefSeq, NCBI used
- [ ] Large sequences → download link instead of preview


# Sequence Retrieval Examples

## Example 1: Find E. coli K-12 Genome

```python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Search
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Escherichia coli",
    strain="K-12",
    seq_type="complete_genome",
    limit=3
)

# Get accessions
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Get sequence (RefSeq reference)
sequence = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession="NC_000913.3",
    format="fasta"
)

print(f"Genome size: {len(sequence['data'])} characters")
```

## Example 2: Get Human BRCA1 Gene

```python
# Search for BRCA1
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Homo sapiens",
    gene="BRCA1",
    limit=5
)

print(f"Found {result['data']['count']} BRCA1 sequences")

# Get top accessions
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Get mRNA sequence with annotations
genbank = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession=accessions["data"][0],
    format="genbank"
)
```

## Example 3: SARS-CoV-2 Reference Genome

```python
# Search for reference genome
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="SARS-CoV-2",
    keywords="reference genome Wuhan",
    limit=1
)

# Get accession (NC_045512)
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Download complete genome
genome = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession="NC_045512.2",
    format="fasta"
)

print(genome["data"][:200])  # Preview
```

## Example 4: Compare RefSeq vs GenBank

```python
# Search returns both types
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Escherichia coli",
    strain="K-12",
    limit=5
)

accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Categorize
refseq = [a for a in accessions["data"] if a.startswith("NC_")]
genbank = [a for a in accessions["data"] if not a.startswith("NC_")]

print(f"RefSeq (NCBI only): {refseq}")
print(f"GenBank (ENA compatible): {genbank}")
```

## Example 5: Multi-Format Retrieval

```python
accessio

What is this skill?

Organism, gene, sequence-type, and strain disambiguation before tool selection

RefSeq versus GenBank accession prefixes with ENA compatibility rules

Per-sequence required fields including curation level tiers (●●●● through ○○○○)

FASTA and GenBank download commands plus BioProject/BioSample cross-links

Error handling for empty results and ENA 404 on incompatible accessions

Adoption & trust: 1.5k installs on skills.sh; 1.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

IdeaOpportunity & market research

SKILL.md

READMESKILL.md - Tooluniverse Sequence Retrieval

# Sequence Retrieval Checklist

Use this checklist to ensure complete sequence profiles.

## Disambiguation

- [ ] Organism confirmed (scientific name)
- [ ] Gene symbol/name identified
- [ ] Sequence type determined (genomic/mRNA/protein)
- [ ] Strain specified (if relevant)
- [ ] Accession prefix identified → tool selection

## Accession Type Handling

- [ ] RefSeq (NC_, NM_, NP_, XM_) → NCBI tools only
- [ ] GenBank (U*, M*, CP*, etc.) → NCBI or ENA
- [ ] ENA tools NOT used with RefSeq accessions

## Per Sequence (Required)

- [ ] Accession number
- [ ] Organism (scientific name)
- [ ] Sequence type (DNA/RNA/protein)
- [ ] Length
- [ ] Curation level (●●●●/●●●○/●●○○/●○○○/○○○○)
- [ ] Database source

## Sequence Details

- [ ] Definition/title
- [ ] Molecule type (DNA/mRNA/protein)
- [ ] Topology (linear/circular)
- [ ] Sequence preview (first 100-200 bp)

## Annotations (If GenBank Format)

- [ ] CDS count and examples
- [ ] Gene count
- [ ] Other features noted

## Cross-References

- [ ] RefSeq accession (if exists)
- [ ] GenBank accession
- [ ] ENA compatibility noted
- [ ] BioProject/BioSample links

## Download Options

- [ ] FASTA format command shown
- [ ] GenBank format command shown
- [ ] Direct database links provided

## Report Quality

- [ ] Search process NOT shown in output
- [ ] Curation level tiers applied
- [ ] Alternative sequences listed
- [ ] Retrieval date included

## Error Handling

- [ ] No results → broader search suggested
- [ ] ENA 404 → recognized as RefSeq, NCBI used
- [ ] Large sequences → download link instead of preview


# Sequence Retrieval Examples

## Example 1: Find E. coli K-12 Genome

```python
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()

# Search
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Escherichia coli",
    strain="K-12",
    seq_type="complete_genome",
    limit=3
)

# Get accessions
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Get sequence (RefSeq reference)
sequence = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession="NC_000913.3",
    format="fasta"
)

print(f"Genome size: {len(sequence['data'])} characters")
```

## Example 2: Get Human BRCA1 Gene

```python
# Search for BRCA1
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Homo sapiens",
    gene="BRCA1",
    limit=5
)

print(f"Found {result['data']['count']} BRCA1 sequences")

# Get top accessions
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Get mRNA sequence with annotations
genbank = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession=accessions["data"][0],
    format="genbank"
)
```

## Example 3: SARS-CoV-2 Reference Genome

```python
# Search for reference genome
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="SARS-CoV-2",
    keywords="reference genome Wuhan",
    limit=1
)

# Get accession (NC_045512)
accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Download complete genome
genome = tu.tools.NCBI_get_sequence(
    operation="fetch_sequence",
    accession="NC_045512.2",
    format="fasta"
)

print(genome["data"][:200])  # Preview
```

## Example 4: Compare RefSeq vs GenBank

```python
# Search returns both types
result = tu.tools.NCBI_search_nucleotide(
    operation="search",
    organism="Escherichia coli",
    strain="K-12",
    limit=5
)

accessions = tu.tools.NCBI_fetch_accessions(
    operation="fetch_accession",
    uids=result["data"]["uids"]
)

# Categorize
refseq = [a for a in accessions["data"] if a.startswith("NC_")]
genbank = [a for a in accessions["data"] if not a.startswith("NC_")]

print(f"RefSeq (NCBI only): {refseq}")
print(f"GenBank (ENA compatible): {genbank}")
```

## Example 5: Multi-Format Retrieval

```python
accessio

Install

What is this skill?

Recommended Skills

Journey fit

Is Tooluniverse Sequence Retrieval safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Tooluniverse Sequence Retrieval safe to install?

SKILL.md