Install

Terminal · npx

$npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-sequence-retrieval

Works with Paperclip

How Tooluniverse Sequence Retrieval fits into a Paperclip company.

Tooluniverse Sequence Retrieval drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md148 linesmarkdown

Expand

1---2name: tooluniverse-sequence-retrieval3description: Retrieves biological sequences (DNA, RNA, protein) from NCBI and ENA with gene disambiguation, accession type handling, and comprehensive sequence profiles. Creates detailed reports with sequence metadata, cross-database references, and download options. Use when users need nucleotide sequences, protein sequences, genome data, or mention GenBank, RefSeq, EMBL accessions.4---5 6# Biological Sequence Retrieval7 8Retrieve DNA, RNA, and protein sequences with proper disambiguation and cross-database handling.9 10**IMPORTANT**: Always use English terms in tool calls. Only try original-language terms as fallback. Respond in the user's language.11 12**LOOK UP DON'T GUESS**: Never assume accession numbers or sequence versions. Always retrieve and verify from NCBI or ENA.13 14## Domain Reasoning15 16Sequence quality hierarchy: RefSeq (NM_/NP_ = curated) > RefSeq predicted (XM_/XP_) > GenBank (submitted). Prefer the MANE Select transcript for human canonical isoforms. Check version numbers -- annotations improve across versions.17 18## Workflow19 20```21Phase 0: Clarify (if needed) → Phase 1: Disambiguate Gene/Organism → Phase 2: Search & Retrieve → Phase 3: Report22```23 24---25 26## Phase 0: Clarification (When Needed)27 28Ask ONLY if: gene exists in multiple organisms, sequence type unclear, or strain matters.29Skip for: specific accessions, clear organism+gene combos, complete genome requests with organism.30 31---32 33## Phase 1: Gene/Organism Disambiguation34 35### Accession Type Decision Tree36 37| Prefix | Type | Use With |38|--------|------|----------|39| NC_/NM_/NR_/NP_/XM_ | RefSeq | NCBI only |40| U*/M*/K*/X*/CP*/NZ_ | GenBank | NCBI or ENA |41| EMBL format | EMBL | ENA preferred |42 43**CRITICAL**: Never try ENA tools with RefSeq accessions -- they return 404.44 45### Identity Checklist46- Organism confirmed (scientific name)47- Gene symbol/name identified48- Sequence type determined (genomic/mRNA/protein)49- Accession prefix identified for tool selection50 51---52 53## Phase 2: Data Retrieval (Internal)54 55Retrieve silently. Do NOT narrate the search process.56 57```python58# Search NCBI Nucleotide59result = tu.tools.NCBI_search_nucleotide(60    operation="search", organism=organism, gene=gene,61    strain=strain, keywords=keywords, seq_type=seq_type, limit=1062)63 64# Get accessions from UIDs65accessions = tu.tools.NCBI_fetch_accessions(operation="fetch_accession", uids=result["data"]["uids"])66 67# Retrieve sequence (FASTA or GenBank format)68sequence = tu.tools.NCBI_get_sequence(operation="fetch_sequence", accession=accession, format="fasta")69 70# ENA alternative (non-RefSeq accessions only)71entry = tu.tools.ena_get_entry(accession=accession)72fasta = tu.tools.ena_get_sequence_fasta(accession=accession)73```74 75### Fallback Chains76 77| Primary | Fallback | Notes |78|---------|----------|-------|79| NCBI_get_sequence | ENA (if GenBank format) | NCBI unavailable |80| ENA_get_entry | NCBI_get_sequence | ENA doesn't have RefSeq |81| NCBI_search_nucleotide | Try broader keywords | No results |82 83---84 85## Phase 3: Report Sequence Profile86 87Present as a **Sequence Profile Report**. Hide search process. Include:88 891. **Search Summary**: query, database, result count902. **Primary Sequence**: accession, type (RefSeq/GenBank), organism, strain, length, molecule, topology, curation level913. **Sequence Preview**: first lines of FASTA (truncated)924. **Annotations Summary**: CDS/tRNA/rRNA/regulatory feature counts (from GenBank format)935. **Alternative Sequences**: ranked by relevance and curation, with ENA compatibility946. **Cross-Database References**: RefSeq, GenBank, ENA/EMBL, BioProject, BioSample957. **Download Options**: FASTA (for BLAST/alignment), GenBank (for annotation)96 97### Curation Level Tiers98 99| Tier | Prefix | Description |100|------|--------|-------------|101| RefSeq Reference (best) | NC_, NM_, NP_ | NCBI-curated, gold standard |102| RefSeq Predicted | XM_, XP_, XR_ | Computationally predicted |103| GenBank Validated | Various | Submitted, some curation |104| GenBank Direct | Various | Direct submission |105| Third Party | TPA_ | Third-party annotation |106 107---108 109## Reasoning Framework110 111**Sequence quality**: Prefer RefSeq over GenBank. Check version numbers. Sequences with "PREDICTED" in definition are not experimentally validated.112 113**Accession guidance**: RefSeq = NCBI-only. GenBank = mirrored in ENA/EMBL. Default to RefSeq mRNA (NM_) for human/model organisms; most complete genome assembly for microbial queries.114 115**Cross-database reconciliation**: Same sequence may have different accessions (e.g., GenBank U00096 = RefSeq NC_000913 for E. coli K-12). Always report both when available. Discrepancies between GenBank/RefSeq typically indicate RefSeq curation corrected submission errors.116 117### Synthesis Questions1181. What is the highest-quality accession available?1192. Are there alternative accessions in other databases?1203. What is the annotation completeness?1214. Is the sequence from the expected organism/strain?1225. What download format suits the user's downstream analysis?123 124---125 126## Error Handling127 128| Error | Response |129|-------|----------|130| "No search criteria provided" | Add organism, gene, or keywords |131| "ENA 404 error" | Likely RefSeq -- use NCBI only |132| "No results found" | Broaden search, check spelling, try synonyms |133| "Sequence too large" | Note size, provide download link instead |134 135---136 137## Tool Reference138 139**NCBI Tools**: `NCBI_search_nucleotide` (search), `NCBI_fetch_accessions` (UID→accession), `NCBI_get_sequence` (retrieve)140**ENA Tools (GenBank/EMBL only)**: `ena_get_entry` (metadata), `ena_get_sequence_fasta` (FASTA), `ena_get_entry_summary` (summary)141 142---143 144## Search Parameters Reference145 146**NCBI_search_nucleotide**: `operation`="search", `organism` (scientific name), `gene` (symbol), `strain`, `keywords`, `seq_type` (complete_genome/mrna/refseq), `limit`147 148**NCBI_get_sequence**: `operation`="fetch_sequence", `accession`, `format` (fasta/genbank)

Related skills

1password

Install 1password skill for Claude Code from steipete/clawdis.

3d Web Experience

Install 3d Web Experience skill for Claude Code from sickn33/antigravity-awesome-skills.

Ab Test Setup

This handles the full A/B testing workflow from hypothesis formation to statistical analysis. It walks you through proper test design, calculates sample sizes,