
Protein Sequence Msa
Submit multi-sequence FASTA-style input to EBI Clustal Omega via the bundled script and retrieve a multiple sequence alignment from your agent.
Overview
Protein Sequence MSA is an agent skill for the Build phase that runs EBI Clustal Omega to produce a multiple sequence alignment from submitted protein sequences.
Install
npx skills add https://github.com/google-deepmind/science-skills --skill protein-sequence-msaWhat is this skill?
- Submits sequences to EBI Clustal Omega REST API with encoded email, title, and sequence payload
- Shared `HttpClient` with QPS=1 against the Clustal Omega services base URL
- 15-minute polling timeout (`_POLLING_TIMEOUT_SECS`) for long-running alignment jobs
- Python ≥3.10 script with `scienceskillscommon` and `python-dotenv` dependencies
- Command-line entry via argparse for agent-driven MSA runs
- 15-minute polling timeout for Clustal Omega jobs
- HTTP client rate limit QPS=1 to EBI Clustal Omega REST endpoint
Adoption & trust: 546 installs on skills.sh; 1.7k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have several protein sequences and need a reliable MSA from Clustal Omega but manual EBI REST submission and job polling is error-prone in agent workflows.
Who is it for?
Indie bioinformatics or health-AI builders wiring agent-driven pipelines that need standard Clustal Omega alignments via EBI.
Skip if: Large in-house alignment clusters or workflows that require local HMMER/MMseqs2 instead of public EBI job queues.
When should I use this skill?
You need a multiple sequence alignment from Clustal Omega for a file of protein sequences inside an automated or agent-driven workflow.
What do I get? / Deliverables
The agent executes the bundled alignment script, polls the EBI service within a 15-minute window, and returns the completed MSA output for downstream modeling or analysis.
- Clustal Omega alignment result from completed EBI job
- Job submission and poll trace suitable for pipeline logs
Recommended Skills
Journey fit
Canonical shelf in Build because the skill implements an external bioinformatics API integration rather than early ideation alone. Integrations subphase reflects REST job submission, polling, and HTTP client usage against EBI Clustal Omega.
How it compares
Skill-packaged EBI Clustal Omega client, not a local MSA binary or generic sequence-editing template.
Common Questions / FAQ
Who is protein-sequence-msa for?
Solo developers and small teams building science agents who need programmatic Clustal Omega alignments through Google DeepMind science-skills patterns.
When should I use protein-sequence-msa?
During Build when integrating sequence alignment into a pipeline, preparing features for structure prediction, or automating homology steps from a multi-sequence input file.
Is protein-sequence-msa safe to install?
It runs network jobs against EBI and may require email parameters; review Security Audits on this page and treat sequence data as sensitive before sending to external services.
SKILL.md
READMESKILL.md - Protein Sequence Msa
# Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # /// script # requires-python = ">=3.10" # dependencies = [ # "scienceskillscommon", # "python-dotenv", # ] # [tool.uv.sources] # scienceskillscommon = { path = "../../scienceskillscommon" } # /// """Runs EBI Clustal Omega for MSA computation. Takes a file with multiple sequences and provides the alignment. """ import argparse import os import sys import time import urllib.parse import dotenv from science_skills.scienceskillscommon import http_client _POLLING_TIMEOUT_SECS = 15 * 60 # 15 minutes. _CLIENT = http_client.HttpClient( "https://www.ebi.ac.uk/Tools/services/rest/clustalo/", qps=1 ) def _prepare_payload(email: str, title: str, sequences: str) -> bytes: """Prepares the payload for the EBI Clustal Omega API.""" params = { "email": email, "title": title, "sequence": sequences, } return urllib.parse.urlencode(params).encode("utf-8") def _align_sequences( *, input_file: str, output_file: str, dry_run: bool = False ) -> None: """Runs EBI Clustal Omega alignment for sequences in a FASTA file. This function takes a FASTA formatted file, submits the sequences to the EBI Clustal Omega web service, polls for the alignment completion, and saves the resulting alignment in FASTA format to the specified output file. Args: input_file: Path to the input file containing sequences in FASTA format. output_file: Path where the resulting MSA in FASTA format will be saved. dry_run: If True, print the payload and exit without submitting the job. """ if not os.path.exists(input_file): print(f"[!] Error: Input file not found: {input_file}") sys.exit(1) max_size_bytes = 4 * 1024 * 1024 # 4 MB file_size = os.path.getsize(input_file) if file_size > max_size_bytes: print( "[!] Error: At most 4 MB file size supported. Found" f" {file_size / (1024 * 1024):.2f} MB." ) sys.exit(1) with open(input_file, "r") as f: sequences = f.read().strip() if not sequences: print("[!] Error: Empty input file.") sys.exit(1) num_sequences = sequences.count(">") if num_sequences < 2: print(f"[!] Error: At least 2 sequences required. Found {num_sequences}.") sys.exit(1) if num_sequences > 4000: print( f"[!] Error: At most 4000 sequences supported. Found {num_sequences}." ) sys.exit(1) print("[*] Submitting sequences to EBI Clustal Omega API...") # 1. Submit Job user_email = os.environ.get("USER_EMAIL") if not user_email: print("[!] Error: USER_EMAIL environment variable is required.") sys.exit(1) data = _prepare_payload(user_email, "MSA", sequences) if dry_run: print(data) sys.exit(0) job_id = _CLIENT.fetch_text( "run", method="POST", data=data, headers={"Accept": "text/plain"} ).strip() print(f"[*] Job ID generated: {job_id}") # 2. Poll the server print("[*] Polling server for completion...") start_time = time.time() while time.time() - start_time < _POLLING_TIMEOUT_SECS: status = _CLIENT.fetch_text( f"status/{job_id}", headers={"Accept": "text/plain"}, timeout=20 ).strip() sys.stdout.write(".") sys.stdout.flush() if status == "FINISHED": print("\n[*] Job marked as FINISHED.") break elif status in ["ERROR", "FAILURE", "NOT_FOUND"]: print(f"\n[!] Job failed with status: {status}") sys.exit(1) time.sleep(10) else: pr